draft-shafranovich-mime-csv-00.txt   draft-shafranovich-mime-csv-01.txt 
Network Working Group Y. Shafranovich Network Working Group Y. Shafranovich
Internet-Draft SolidMatrix Technologies, Inc. Internet-Draft SolidMatrix Technologies, Inc.
Expires: August 6, 2005 February 2, 2005 Expires: August 20, 2005 February 16, 2005
MIME Type for CSV Files Common Format and MIME Type for CSV Files
draft-shafranovich-mime-csv-00.txt draft-shafranovich-mime-csv-01.txt
Status of this Memo Status of this Memo
This document is an Internet-Draft and is subject to all provisions This document is an Internet-Draft and is subject to all provisions
of Section 3 of RFC 3667. By submitting this Internet-Draft, each of Section 3 of RFC 3667. By submitting this Internet-Draft, each
author represents that any applicable patent or other IPR claims of author represents that any applicable patent or other IPR claims of
which he or she is aware have been or will be disclosed, and any of which he or she is aware have been or will be disclosed, and any of
which he or she become aware will be disclosed, in accordance with which he or she become aware will be disclosed, in accordance with
RFC 3668. RFC 3668.
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 6, 2005. This Internet-Draft will expire on August 20, 2005.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2005). Copyright (C) The Internet Society (2005).
Abstract Abstract
This document defines MIME types "text/csv" and This document documents the format used for Comma-Separated Values
"text/comma-separated-values" which used for Comma-Separated Values (CSV) files and registers the associated MIME type "text/csv".
(CSV) files.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. MIME Type Registration of text/csv and 2. Definition of the CSV format . . . . . . . . . . . . . . . . . 3
text/comma-separated-values . . . . . . . . . . . . . . . . . 3 3. MIME Type Registration of text/csv . . . . . . . . . . . . . . 5
3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 4 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6
4. Security Considerations . . . . . . . . . . . . . . . . . . . 5 5. Security Considerations . . . . . . . . . . . . . . . . . . . 6
5. References . . . . . . . . . . . . . . . . . . . . . . . . . . 5 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 6
5.1 Normative References . . . . . . . . . . . . . . . . . . . 5 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.2 Informative References . . . . . . . . . . . . . . . . . . 5 7.1 Normative References . . . . . . . . . . . . . . . . . . . 6
Author's Address . . . . . . . . . . . . . . . . . . . . . . . 5 7.2 Informative References . . . . . . . . . . . . . . . . . . 7
A. Appendix A - Discussion of the CSV format . . . . . . . . . . 6 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 7
Intellectual Property and Copyright Statements . . . . . . . . 8 A. Status of This Document [To Be Removed Upon Publication] . . . 7
A.1 Discussion Venue . . . . . . . . . . . . . . . . . . . . . 7
A.2 Document Repository . . . . . . . . . . . . . . . . . . . 7
A.3 Document History . . . . . . . . . . . . . . . . . . . . . 8
Intellectual Property and Copyright Statements . . . . . . . . 9
1. Introduction 1. Introduction
The comma separated values format (CSV) has been used for exchanging The comma separated values format (CSV) has been used for exchanging
and converting data between various spreadsheet programs for quite and converting data between various spreadsheet programs for quite
some time. Surprisingly, while this file is very common it has never some time. Surprisingly, while this format is very common it has
been formally documented. Additionally, while the IANA MIME never been formally documented. Additionally, while the IANA MIME
registration tree includes a registraton for registration tree includes a registration for
"text/tab-separated-values" type, no MIME types have ever been "text/tab-separated-values" type, no MIME types have ever been
registered with IANA for CSV. At the same time, various programs and registered with IANA for CSV. At the same time, various programs and
operating systems have begun to use different MIME types for this operating systems have begun to use different MIME types for this
format, many of which vary from system to system. This document format, many of which vary from system to system. This document
seeks to formally register two MIME types for CSV in accordance with seeks to document the format of comma separated values (CSV) files
RFC 2048 [4]. and to formally register the "text/csv" MIME type for CSV in
accordance with RFC 2048 [4].
2. MIME Type Registration of text/csv and text/comma-separated-values 2. Definition of the CSV format
While there are various specifications and implementations for the
CSV format (for ex. [5], [6], [7] and [8]), no formal specification
exists which causes a wide variety of interpretations for CSV files.
This section seeks to document the format that seems to be followed
by most implementations:
1. Each record is located on a separate line delimited by a line
break (CRLF). For example:
aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF
2. The last record in the file may or may not have an ending
linebreak. For example:
aaa,bbb,ccc CRLF
zzz,yyy,xxx
3. There maybe an optional header line appearing as the first line
of the file with the same format as normal record lines. This
header will contain names corresponding to the fields in the file
and will usually contain the same number of fields as the records
in the rest of the file. For example:
field_name,field_name,field_name CRLF
aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF
4. Within the header and each record there may be one or more
fields, delimited by commas. The last field in the record may or
may not be followed by a comma. For example:
aaa,bbb,ccc
5. Each field may or may not be enclosed in double quotes (however
some programs such as Microsoft Excel do not use double quotes at
all). For example:
"aaa","bbb","ccc" CRLF
zzz,yyy,xxx
6. Field containing line breaks (CRLF) and commas should be enclosed
in double-quotes. For example:
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
7. If double-quotes are used to enclosed fields, then double-quotes
inside fields must be surounded by double quotes. For example:
"aaa","b"""bb","ccc"
The ABNF grammar [1] appears as follows:
file = [header CRLF] record *(CRLF record) [CRLF]
header = name *(COMMA name)
record = field *(COMMA field)
name = field
field = (escaped / non-escaped)
escaped = DQUOTE *(VCHAR / CR / LF / CRLF / 3*DQUOTE) DQUOTE
non-escaped = *VCHAR
COMMA = %x2C
CR = %x0D ;as per section 6.1 of RFC 2234 [1]
LF = %x0A ;as per section 6.1 of RFC 2234 [1]
CRLF = CR LF ;as per section 6.1 of RFC 2234 [1]
VCAR = %x21-7E ;as per section 6.1 of RFC 2234 [1]
3. MIME Type Registration of text/csv
This section provides the media-type registration application (as per This section provides the media-type registration application (as per
RFC 2048 [4], which will be submitted to IANA after IESG approval of RFC 2048 [4], which will be submitted to IANA after IESG approval of
this document. this document.
To: ietf-types@iana.org To: ietf-types@iana.org
Subject: Registration of MIME media types text/csv and Subject: Registration of MIME media type text/csv
text/comma-separated-values
MIME media type name: text MIME media type name: text
MIME subtype name: csv, comma-separated-values MIME subtype name: csv
Required parameters: none Required parameters: none
Optional parameters: charset Optional parameters: charset
Common usage of CSV is US-ASCII, but other character sets as Common usage of CSV is US-ASCII, but other character sets as
defined by IANA for the "text" tree may be used. defined by IANA for the "text" tree may be used.
Encoding considerations: Encoding considerations:
While section 4.1.1. of RFC 2046 [1] stipulates that "text" As per section 4.1.1. of RFC 2046 [2], this media type uses CRLF
subtypes MUST use a CRLF sequence as a line break, in practice to denote line breaks. However, implementors should be aware that
that is not always true for CSV. Therefore, implementors should some implementations may use other values.
be aware that either CR or CRLF maybe used as a line break for
this format.
Security considerations: Security considerations:
CSV files contain passive text data which should not pose any CSV files contain passive text data which should not pose any
risks. However, it is possible in theory that malicious binary risks. However, it is possible in theory that malicious binary
data maybe included in order to exploit potential buffer overruns data maybe included in order to exploit potential buffer overruns
in the program processing CSV data. Additionally, private data in the program processing CSV data. Additionally, private data
maybe shared via this format which of course applies to any text maybe shared via this format (which of course applies to any text
data. data).
Interoperability considerations: Interoperability considerations:
Due to lack of a single specification there are considerable Due to lack of a single specification there are considerable
differences among different implementations as described in differences among different implementations. Implementors should
appendix A. The most common difference among various format is "be conservative in what you do, be liberal in what you accept
whether double quotes (") are used to enclose strings. from others" (RFC 793 [3]) when processing CSV files. An attempt
Implementors should "be conservative in what you do, be liberal in at a common definition can be found in Section 2.
what you accept from others" (RFC 793 [2]) when processing CSV
files.
Published specification: Published specification:
While numerous private specifications exist for various programs While numerous private specifications exist for various programs
and systems, there is no single "master" specification for this and systems, there is no single "master" specification for this
format. A sampling of formats and discussion of differences is format. An attempt at documentating a common definition can be
included in appendix A. found in Section 2.
Applications which use this media type: Applications which use this media type:
Spreadsheet programs and various data conversion utilities Spreadsheet programs and various data conversion utilities
Additional information: Additional information:
Magic number(s): none Magic number(s): none
File extension(s): CSV File extension(s): CSV
skipping to change at page 4, line 46 skipping to change at page 6, line 27
Macintosh File Type Code(s): TEXT Macintosh File Type Code(s): TEXT
Person & email address to contact for further information: Person & email address to contact for further information:
Yakov Shafranovich <ietf@shaftek.org> Yakov Shafranovich <ietf@shaftek.org>
Intended usage: COMMON Intended usage: COMMON
Author/Change controller: IESG Author/Change controller: IESG
3. IANA Considerations 4. IANA Considerations
After IESG approval, IANA is expected to register these two types After IESG approval, IANA is expected to register the MIME type
"text/csv" and "text/comma-separated-values" using the application "text/csv" using the application provided in Section 3 of this
provided in this document. document.
4. Security Considerations 5. Security Considerations
See discussion above See discussion above
5. References 6. Acknowledgments
5.1 Normative References The author would like to thank Dave Crocker, Martin Duerst and Bruce
Lilly for their helpful suggestions. A special word of thanks to
Dave for helping with the ABNF grammar.
[1] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 7. References
7.1 Normative References
[1] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", RFC 2234, November 1997.
[2] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types", RFC 2046, November Extensions (MIME) Part Two: Media Types", RFC 2046, November
1996. 1996.
[2] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, [3] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
September 1981. September 1981.
[3] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 7.2 Informative References
Specifications: ABNF", RFC 2234, November 1997.
5.2 Informative References
[4] Freed, N., Klensin, J. and J. Postel, "Multipurpose Internet [4] Freed, N., Klensin, J. and J. Postel, "Multipurpose Internet
Mail Extensions (MIME) Part Four: Registration Procedures", Mail Extensions (MIME) Part Four: Registration Procedures",
BCP 13, RFC 2048, November 1996. BCP 13, RFC 2048, November 1996.
[5] Repici, J., "HOW-TO: The Comma Separated Value (CSV) File [5] Repici, J., "HOW-TO: The Comma Separated Value (CSV) File
Format", 2004, Format", 2004,
<http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm>. <http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm>.
[6] Edoceo, Inc., "CSV Standard File Format", 2004, [6] Edoceo, Inc., "CSV Standard File Format", 2004,
skipping to change at page 6, line 5 skipping to change at page 7, line 38
<http://www.catb.org/~esr/writings/taoup/html/ch05s02.html>. <http://www.catb.org/~esr/writings/taoup/html/ch05s02.html>.
Author's Address Author's Address
Yakov Shafranovich Yakov Shafranovich
SolidMatrix Technologies, Inc. SolidMatrix Technologies, Inc.
Email: ietf@shaftek.org Email: ietf@shaftek.org
URI: http://www.shaftek.org URI: http://www.shaftek.org
Appendix A. Appendix A - Discussion of the CSV format Appendix A. Status of This Document [To Be Removed Upon Publication]
While there are various specifications and implementations for the
CSV format (for ex. [5], [6], [7] and [8]), no formal specification
exists. This causes a wide variety of interpretations for CSV files.
While this document does not seek to document the CSV format,
nevertheless we want to document the format that seems to be followed
by most implementations:
1. Each record is located on a separate line delimited by a line
break (either CR or CR/LF). For example:
aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF
2. The last record in the file may or may not have an ending
linebreak. For example:
aaa,bbb,ccc CRLF
zzz,yyy,xxx
3. There maybe an optional header line appearing as the first line
of the file with the same format as normal record lines. This
header will contain names corresponding to the fields in the file
and will usually contain the same number of fields as the records
in the rest of the file. For example:
field_name,field_name,field_name CRLF
aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF
4. Within the header and each record there may be one or more
fields, delimited by commas. The last field in the record may or
may not be followed by a comma. For example:
aaa,bbb,ccc
5. Each field may or may not be enclosed in double quotes, however
some programs such as Microsoft Excel do not use double quotes at
all. For example:
"aaa","bbb","ccc" CRLF
zzz,yyy,xxx
6. Field containing line breaks (CR or CR/LF) and commas should be
enclosed in double-quotes. For example:
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
7. If double-quotes are used to enclosed fields, then double-quotes
inside fields must be surounded by double quotes. For example:
"aaa","b"""bb","ccc"
8. Whitespace immediately before and after commas maybe removed
unless it appears inside double-quotes. For example:
zzz, yyy , xxx
would be processed as if it was: A.1 Discussion Venue
zzz,yyy,xxx Discussion about this document should be directed to the IETF-TYPES
mailing list <http://www.alvestrand.no/mailman/listinfo/ietf-types/>
which is also reachable via <ietf-types@iana.org>. Of course,
comments directly to the author are always welcome.
The ABNF grammar [3] appears as follows: A.2 Document Repository
COMMA = %x2C Copies of this and earlier versions including multiple formats can be
found at <http://www.shaftek.org/publications/drafts/mime-csv/>.
file = [header] *record A.3 Document History
end-of-field = COMMA / (CR / CRLF) Changes from draft-shafranovich-mime-csv-00 to
draft-shafranovich-mime-csv-01:
header = *(*WSP field *WSP end-of-field) o Type "text/comma-separated-values" has been removed
record = *(*WSP field *WSP end-of-field) o The "encoding consideration" paragraph of Section 3 has been
changed to allow CRLF only as per section 4.1.1. of RFC 2046 [2].
This has been reflected in the ABNF grammar in Section 2.
field = escaped / non-escaped o ABNF grammar in Section 2 has been cleaned up.
escaped = DQUOTE *(VCHAR / CR / CRLF / 3*DQUOTE) DQUOTE o Acknowledgements and status sections were added.
non-escaped = *VCHAR o CSV format definition was moved to the normative section of the
document
Intellectual Property Statement Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be on the procedures with respect to rights in RFC documents can be
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/