draft-shafranovich-mime-csv-02.txt   draft-shafranovich-mime-csv-03.txt 
Network Working Group Y. Shafranovich Network Working Group Y. Shafranovich
Internet-Draft SolidMatrix Technologies, Inc. Internet-Draft SolidMatrix Technologies, Inc.
Expires: August 22, 2005 February 18, 2005 Expires: September 24, 2005 March 23, 2005
Common Format and MIME Type for CSV Files Common Format and MIME Type for CSV Files
draft-shafranovich-mime-csv-02.txt draft-shafranovich-mime-csv-03.txt
Status of this Memo Status of this Memo
This document is an Internet-Draft and is subject to all provisions This document is an Internet-Draft and is subject to all provisions
of Section 3 of RFC 3667. By submitting this Internet-Draft, each of Section 3 of RFC 3667. By submitting this Internet-Draft, each
author represents that any applicable patent or other IPR claims of author represents that any applicable patent or other IPR claims of
which he or she is aware have been or will be disclosed, and any of which he or she is aware have been or will be disclosed, and any of
which he or she become aware will be disclosed, in accordance with which he or she become aware will be disclosed, in accordance with
RFC 3668. RFC 3668.
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 22, 2005. This Internet-Draft will expire on September 24, 2005.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2005). Copyright (C) The Internet Society (2005).
Abstract Abstract
This document documents the format used for Comma-Separated Values This document documents the format used for Comma-Separated Values
(CSV) files and registers the associated MIME type "text/csv". (CSV) files and registers the associated MIME type "text/csv".
skipping to change at page 3, line 18 skipping to change at page 3, line 18
and converting data between various spreadsheet programs for quite and converting data between various spreadsheet programs for quite
some time. Surprisingly, while this format is very common it has some time. Surprisingly, while this format is very common it has
never been formally documented. Additionally, while the IANA MIME never been formally documented. Additionally, while the IANA MIME
registration tree includes a registration for registration tree includes a registration for
"text/tab-separated-values" type, no MIME types have ever been "text/tab-separated-values" type, no MIME types have ever been
registered with IANA for CSV. At the same time, various programs and registered with IANA for CSV. At the same time, various programs and
operating systems have begun to use different MIME types for this operating systems have begun to use different MIME types for this
format, many of which vary from system to system. This document format, many of which vary from system to system. This document
seeks to document the format of comma separated values (CSV) files seeks to document the format of comma separated values (CSV) files
and to formally register the "text/csv" MIME type for CSV in and to formally register the "text/csv" MIME type for CSV in
accordance with RFC 2048 [4]. accordance with RFC 2048 [1].
2. Definition of the CSV format 2. Definition of the CSV format
While there are various specifications and implementations for the While there are various specifications and implementations for the
CSV format (for ex. [5], [6], [7] and [8]), no formal specification CSV format (for ex. [4], [5], [6] and [7]), no formal specification
exists which causes a wide variety of interpretations for CSV files. exists which causes a wide variety of interpretations for CSV files.
This section seeks to document the format that seems to be followed This section seeks to document the format that seems to be followed
by most implementations: by most implementations:
1. Each record is located on a separate line delimited by a line 1. Each record is located on a separate line delimited by a line
break (CRLF). For example: break (CRLF). For example:
aaa,bbb,ccc CRLF aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF zzz,yyy,xxx CRLF
skipping to change at page 3, line 51 skipping to change at page 3, line 51
of the file with the same format as normal record lines. This of the file with the same format as normal record lines. This
header will contain names corresponding to the fields in the file header will contain names corresponding to the fields in the file
and will usually contain the same number of fields as the records and will usually contain the same number of fields as the records
in the rest of the file. For example: in the rest of the file. For example:
field_name,field_name,field_name CRLF field_name,field_name,field_name CRLF
aaa,bbb,ccc CRLF aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF zzz,yyy,xxx CRLF
4. Within the header and each record there may be one or more 4. Within the header and each record there may be one or more
fields, delimited by commas. The last field in the record may or fields, separated by commas. The last field in the record may
may not be followed by a comma. For example: not be followed by a comma. For example:
aaa,bbb,ccc aaa,bbb,ccc
5. Each field may or may not be enclosed in double quotes (however 5. Each field may or may not be enclosed in double quotes (however
some programs such as Microsoft Excel do not use double quotes at some programs such as Microsoft Excel do not use double quotes at
all). For example: all). For example:
"aaa","bbb","ccc" CRLF "aaa","bbb","ccc" CRLF
zzz,yyy,xxx zzz,yyy,xxx
6. Field containing line breaks (CRLF) and commas should be enclosed 6. Field containing line breaks (CRLF) and commas should be enclosed
in double-quotes. For example: in double-quotes. For example:
"aaa","b CRLF "aaa","b CRLF
bb","ccc" CRLF bb","ccc" CRLF
zzz,yyy,xxx zzz,yyy,xxx
7. If double-quotes are used to enclosed fields, then double-quotes 7. If double-quotes are used to enclosed fields, then a double-quote
inside fields must be surrounded by double quotes. For example: appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b"""bb","ccc" "aaa","b""bb","ccc"
The ABNF grammar [1] appears as follows: The ABNF grammar [2] appears as follows:
file = [header CRLF] record *(CRLF record) [CRLF] file = [header CRLF] record *(CRLF record) [CRLF]
header = name *(COMMA name) header = name *(COMMA name)
record = field *(COMMA field) record = field *(COMMA field)
name = field name = field
field = (escaped / non-escaped) field = (escaped / non-escaped)
escaped = DQUOTE *(VCHAR / CR / LF / CRLF / 3*DQUOTE) DQUOTE escaped = DQUOTE *(VCHAR / CR / LF / CRLF / 2*DQUOTE) DQUOTE
non-escaped = *VCHAR non-escaped = *VCHAR
COMMA = %x2C COMMA = %x2C
CR = %x0D ;as per section 6.1 of RFC 2234 [1] CR = %x0D ;as per section 6.1 of RFC 2234 [2]
DQUOTE = %x22;as per section 6.1 of RFC 2234 [1] DQUOTE = %x22;as per section 6.1 of RFC 2234 [2]
LF = %x0A ;as per section 6.1 of RFC 2234 [1] LF = %x0A ;as per section 6.1 of RFC 2234 [2]
CRLF = CR LF ;as per section 6.1 of RFC 2234 [1] CRLF = CR LF ;as per section 6.1 of RFC 2234 [2]
VCHAR = %x21-7E ;as per section 6.1 of RFC 2234 [1] VCHAR = %x21-7E ;as per section 6.1 of RFC 2234 [2]
3. MIME Type Registration of text/csv 3. MIME Type Registration of text/csv
This section provides the media-type registration application (as per This section provides the media-type registration application (as per
RFC 2048 [4], which will be submitted to IANA after IESG approval of RFC 2048 [1], which will be submitted to IANA after IESG approval of
this document. this document.
To: ietf-types@iana.org To: ietf-types@iana.org
Subject: Registration of MIME media type text/csv Subject: Registration of MIME media type text/csv
MIME media type name: text MIME media type name: text
MIME subtype name: csv MIME subtype name: csv
Required parameters: none Required parameters: none
Optional parameters: charset Optional parameters: charset
Common usage of CSV is US-ASCII, but other character sets as Common usage of CSV is US-ASCII, but other character sets as
defined by IANA for the "text" tree may be used. defined by IANA for the "text" tree may be used.
Encoding considerations: Encoding considerations:
As per section 4.1.1. of RFC 2046 [2], this media type uses CRLF As per section 4.1.1. of RFC 2046 [3], this media type uses CRLF
to denote line breaks. However, implementors should be aware that to denote line breaks. However, implementors should be aware that
some implementations may use other values. some implementations may use other values.
Security considerations: Security considerations:
CSV files contain passive text data which should not pose any CSV files contain passive text data which should not pose any
risks. However, it is possible in theory that malicious binary risks. However, it is possible in theory that malicious binary
data maybe included in order to exploit potential buffer overruns data maybe included in order to exploit potential buffer overruns
in the program processing CSV data. Additionally, private data in the program processing CSV data. Additionally, private data
maybe shared via this format (which of course applies to any text maybe shared via this format (which of course applies to any text
data). data).
Interoperability considerations: Interoperability considerations:
Due to lack of a single specification there are considerable Due to lack of a single specification there are considerable
differences among different implementations. Implementors should differences among different implementations. Implementors should
"be conservative in what you do, be liberal in what you accept "be conservative in what you do, be liberal in what you accept
from others" (RFC 793 [3]) when processing CSV files. An attempt from others" (RFC 793 [8]) when processing CSV files. An attempt
at a common definition can be found in Section 2. at a common definition can be found in Section 2.
Published specification: Published specification:
While numerous private specifications exist for various programs While numerous private specifications exist for various programs
and systems, there is no single "master" specification for this and systems, there is no single "master" specification for this
format. An attempt at a common definition can be found in format. An attempt at a common definition can be found in
Section 2. Section 2.
Applications which use this media type: Applications which use this media type:
skipping to change at page 6, line 39 skipping to change at page 6, line 39
After IESG approval, IANA is expected to register the MIME type After IESG approval, IANA is expected to register the MIME type
"text/csv" using the application provided in Section 3 of this "text/csv" using the application provided in Section 3 of this
document. document.
5. Security Considerations 5. Security Considerations
See discussion above See discussion above
6. Acknowledgments 6. Acknowledgments
The author would like to thank Dave Crocker, Martin Duerst and Bruce The author would like to thank Dave Crocker, Martin Duerst, Clyde
Lilly for their helpful suggestions. A special word of thanks to Ingram, Graham Klyne, Bruce Lilly and Chris Lilley for their helpful
Dave for helping with the ABNF grammar. suggestions. A special word of thanks to Dave for helping with the
ABNF grammar.
7. References 7. References
7.1 Normative References 7.1 Normative References
[1] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax [1] Freed, N., Klensin, J. and J. Postel, "Multipurpose Internet
Mail Extensions (MIME) Part Four: Registration Procedures",
BCP 13, RFC 2048, November 1996.
[2] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", RFC 2234, November 1997. Specifications: ABNF", RFC 2234, November 1997.
[2] Freed, N. and N. Borenstein, "Multipurpose Internet Mail [3] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types", RFC 2046, November Extensions (MIME) Part Two: Media Types", RFC 2046, November
1996. 1996.
[3] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
September 1981.
7.2 Informative References 7.2 Informative References
[4] Freed, N., Klensin, J. and J. Postel, "Multipurpose Internet [4] Repici, J., "HOW-TO: The Comma Separated Value (CSV) File
Mail Extensions (MIME) Part Four: Registration Procedures",
BCP 13, RFC 2048, November 1996.
[5] Repici, J., "HOW-TO: The Comma Separated Value (CSV) File
Format", 2004, Format", 2004,
<http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm>. <http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm>.
[6] Edoceo, Inc., "CSV Standard File Format", 2004, [5] Edoceo, Inc., "CSV Standard File Format", 2004,
<http://www.edoceo.com/utilis/csv-file-format.php>. <http://www.edoceo.com/utilis/csv-file-format.php>.
[7] Rodger, R. and O. Shanaghy, "Documentation for Ricebridge CSV [6] Rodger, R. and O. Shanaghy, "Documentation for Ricebridge CSV
Manager", February 2005, Manager", February 2005,
<http://www.ricebridge.com/products/csvman/reference.htm>. <http://www.ricebridge.com/products/csvman/reference.htm>.
[8] Raymond, E., "The Art of Unix Programming, Chapter 5", September [7] Raymond, E., "The Art of Unix Programming, Chapter 5", September
2003, 2003,
<http://www.catb.org/~esr/writings/taoup/html/ch05s02.html>. <http://www.catb.org/~esr/writings/taoup/html/ch05s02.html>.
[8] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
September 1981.
Author's Address Author's Address
Yakov Shafranovich Yakov Shafranovich
SolidMatrix Technologies, Inc. SolidMatrix Technologies, Inc.
Email: ietf@shaftek.org Email: ietf@shaftek.org
URI: http://www.shaftek.org URI: http://www.shaftek.org
Appendix A. Status of This Document [To Be Removed Upon Publication] Appendix A. Status of This Document [To Be Removed Upon Publication]
skipping to change at page 8, line 7 skipping to change at page 8, line 8
which is also reachable via <ietf-types@iana.org>. Of course, which is also reachable via <ietf-types@iana.org>. Of course,
comments directly to the author are always welcome. comments directly to the author are always welcome.
A.2 Document Repository A.2 Document Repository
Copies of this and earlier versions including multiple formats can be Copies of this and earlier versions including multiple formats can be
found at <http://www.shaftek.org/publications/drafts/mime-csv/>. found at <http://www.shaftek.org/publications/drafts/mime-csv/>.
A.3 Document History A.3 Document History
Changes from draft-shafranovich-mime-csv-02 to
draft-shafranovich-mime-csv-03:
o Changed text to prohibit the last field ending with a comma
matching the ABNF grammar
o The double quote escaping is now set to two double quotes instead
of three
o Moved some of the references between informative and normative
sections
Changes from draft-shafranovich-mime-csv-01 to Changes from draft-shafranovich-mime-csv-01 to
draft-shafranovich-mime-csv-00: draft-shafranovich-mime-csv-02:
o Minor errors in ABNF grammar corrected in response to AD comments o Minor errors in ABNF grammar corrected in response to AD comments
o Minor spelling mistakes corrected o Minor spelling mistakes corrected
Changes from draft-shafranovich-mime-csv-00 to Changes from draft-shafranovich-mime-csv-00 to
draft-shafranovich-mime-csv-01: draft-shafranovich-mime-csv-01:
o Type "text/comma-separated-values" has been removed o Type "text/comma-separated-values" has been removed
o The "encoding consideration" paragraph of Section 3 has been o The "encoding consideration" paragraph of Section 3 has been
changed to allow CRLF only as per section 4.1.1. of RFC 2046 [2]. changed to allow CRLF only as per section 4.1.1. of RFC 2046 [3].
This has been reflected in the ABNF grammar in Section 2. This has been reflected in the ABNF grammar in Section 2.
o ABNF grammar in Section 2 has been cleaned up. o ABNF grammar in Section 2 has been cleaned up.
o Acknowledgements and status sections were added. o Acknowledgements and status sections were added.
o CSV format definition was moved to the normative section of the o CSV format definition was moved to the normative section of the
document document
Intellectual Property Statement Intellectual Property Statement
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/