| draft-shafranovich-mime-csv-00.txt | draft-shafranovich-mime-csv-01.txt | |||
|---|---|---|---|---|
| Network Working Group Y. Shafranovich | Network Working Group Y. Shafranovich | |||
| Internet-Draft SolidMatrix Technologies, Inc. | Internet-Draft SolidMatrix Technologies, Inc. | |||
| Expires: August 6, 2005 February 2, 2005 | Expires: August 20, 2005 February 16, 2005 | |||
| MIME Type for CSV Files | Common Format and MIME Type for CSV Files | |||
| draft-shafranovich-mime-csv-00.txt | draft-shafranovich-mime-csv-01.txt | |||
| Status of this Memo | Status of this Memo | |||
| This document is an Internet-Draft and is subject to all provisions | This document is an Internet-Draft and is subject to all provisions | |||
| of Section 3 of RFC 3667. By submitting this Internet-Draft, each | of Section 3 of RFC 3667. By submitting this Internet-Draft, each | |||
| author represents that any applicable patent or other IPR claims of | author represents that any applicable patent or other IPR claims of | |||
| which he or she is aware have been or will be disclosed, and any of | which he or she is aware have been or will be disclosed, and any of | |||
| which he or she become aware will be disclosed, in accordance with | which he or she become aware will be disclosed, in accordance with | |||
| RFC 3668. | RFC 3668. | |||
| skipping to change at page 1, line 35 | skipping to change at page 1, line 35 | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on August 6, 2005. | This Internet-Draft will expire on August 20, 2005. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (C) The Internet Society (2005). | Copyright (C) The Internet Society (2005). | |||
| Abstract | Abstract | |||
| This document defines MIME types "text/csv" and | This document documents the format used for Comma-Separated Values | |||
| "text/comma-separated-values" which used for Comma-Separated Values | (CSV) files and registers the associated MIME type "text/csv". | |||
| (CSV) files. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2. MIME Type Registration of text/csv and | 2. Definition of the CSV format . . . . . . . . . . . . . . . . . 3 | |||
| text/comma-separated-values . . . . . . . . . . . . . . . . . 3 | 3. MIME Type Registration of text/csv . . . . . . . . . . . . . . 5 | |||
| 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 4 | 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 4. Security Considerations . . . . . . . . . . . . . . . . . . . 5 | 5. Security Considerations . . . . . . . . . . . . . . . . . . . 6 | |||
| 5. References . . . . . . . . . . . . . . . . . . . . . . . . . . 5 | 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 5.1 Normative References . . . . . . . . . . . . . . . . . . . 5 | 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 5.2 Informative References . . . . . . . . . . . . . . . . . . 5 | 7.1 Normative References . . . . . . . . . . . . . . . . . . . 6 | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . 5 | 7.2 Informative References . . . . . . . . . . . . . . . . . . 7 | |||
| A. Appendix A - Discussion of the CSV format . . . . . . . . . . 6 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| Intellectual Property and Copyright Statements . . . . . . . . 8 | A. Status of This Document [To Be Removed Upon Publication] . . . 7 | |||
| A.1 Discussion Venue . . . . . . . . . . . . . . . . . . . . . 7 | ||||
| A.2 Document Repository . . . . . . . . . . . . . . . . . . . 7 | ||||
| A.3 Document History . . . . . . . . . . . . . . . . . . . . . 8 | ||||
| Intellectual Property and Copyright Statements . . . . . . . . 9 | ||||
| 1. Introduction | 1. Introduction | |||
| The comma separated values format (CSV) has been used for exchanging | The comma separated values format (CSV) has been used for exchanging | |||
| and converting data between various spreadsheet programs for quite | and converting data between various spreadsheet programs for quite | |||
| some time. Surprisingly, while this file is very common it has never | some time. Surprisingly, while this format is very common it has | |||
| been formally documented. Additionally, while the IANA MIME | never been formally documented. Additionally, while the IANA MIME | |||
| registration tree includes a registraton for | registration tree includes a registration for | |||
| "text/tab-separated-values" type, no MIME types have ever been | "text/tab-separated-values" type, no MIME types have ever been | |||
| registered with IANA for CSV. At the same time, various programs and | registered with IANA for CSV. At the same time, various programs and | |||
| operating systems have begun to use different MIME types for this | operating systems have begun to use different MIME types for this | |||
| format, many of which vary from system to system. This document | format, many of which vary from system to system. This document | |||
| seeks to formally register two MIME types for CSV in accordance with | seeks to document the format of comma separated values (CSV) files | |||
| RFC 2048 [4]. | and to formally register the "text/csv" MIME type for CSV in | |||
| accordance with RFC 2048 [4]. | ||||
| 2. MIME Type Registration of text/csv and text/comma-separated-values | 2. Definition of the CSV format | |||
| While there are various specifications and implementations for the | ||||
| CSV format (for ex. [5], [6], [7] and [8]), no formal specification | ||||
| exists which causes a wide variety of interpretations for CSV files. | ||||
| This section seeks to document the format that seems to be followed | ||||
| by most implementations: | ||||
| 1. Each record is located on a separate line delimited by a line | ||||
| break (CRLF). For example: | ||||
| aaa,bbb,ccc CRLF | ||||
| zzz,yyy,xxx CRLF | ||||
| 2. The last record in the file may or may not have an ending | ||||
| linebreak. For example: | ||||
| aaa,bbb,ccc CRLF | ||||
| zzz,yyy,xxx | ||||
| 3. There maybe an optional header line appearing as the first line | ||||
| of the file with the same format as normal record lines. This | ||||
| header will contain names corresponding to the fields in the file | ||||
| and will usually contain the same number of fields as the records | ||||
| in the rest of the file. For example: | ||||
| field_name,field_name,field_name CRLF | ||||
| aaa,bbb,ccc CRLF | ||||
| zzz,yyy,xxx CRLF | ||||
| 4. Within the header and each record there may be one or more | ||||
| fields, delimited by commas. The last field in the record may or | ||||
| may not be followed by a comma. For example: | ||||
| aaa,bbb,ccc | ||||
| 5. Each field may or may not be enclosed in double quotes (however | ||||
| some programs such as Microsoft Excel do not use double quotes at | ||||
| all). For example: | ||||
| "aaa","bbb","ccc" CRLF | ||||
| zzz,yyy,xxx | ||||
| 6. Field containing line breaks (CRLF) and commas should be enclosed | ||||
| in double-quotes. For example: | ||||
| "aaa","b CRLF | ||||
| bb","ccc" CRLF | ||||
| zzz,yyy,xxx | ||||
| 7. If double-quotes are used to enclosed fields, then double-quotes | ||||
| inside fields must be surounded by double quotes. For example: | ||||
| "aaa","b"""bb","ccc" | ||||
| The ABNF grammar [1] appears as follows: | ||||
| file = [header CRLF] record *(CRLF record) [CRLF] | ||||
| header = name *(COMMA name) | ||||
| record = field *(COMMA field) | ||||
| name = field | ||||
| field = (escaped / non-escaped) | ||||
| escaped = DQUOTE *(VCHAR / CR / LF / CRLF / 3*DQUOTE) DQUOTE | ||||
| non-escaped = *VCHAR | ||||
| COMMA = %x2C | ||||
| CR = %x0D ;as per section 6.1 of RFC 2234 [1] | ||||
| LF = %x0A ;as per section 6.1 of RFC 2234 [1] | ||||
| CRLF = CR LF ;as per section 6.1 of RFC 2234 [1] | ||||
| VCAR = %x21-7E ;as per section 6.1 of RFC 2234 [1] | ||||
| 3. MIME Type Registration of text/csv | ||||
| This section provides the media-type registration application (as per | This section provides the media-type registration application (as per | |||
| RFC 2048 [4], which will be submitted to IANA after IESG approval of | RFC 2048 [4], which will be submitted to IANA after IESG approval of | |||
| this document. | this document. | |||
| To: ietf-types@iana.org | To: ietf-types@iana.org | |||
| Subject: Registration of MIME media types text/csv and | Subject: Registration of MIME media type text/csv | |||
| text/comma-separated-values | ||||
| MIME media type name: text | MIME media type name: text | |||
| MIME subtype name: csv, comma-separated-values | MIME subtype name: csv | |||
| Required parameters: none | Required parameters: none | |||
| Optional parameters: charset | Optional parameters: charset | |||
| Common usage of CSV is US-ASCII, but other character sets as | Common usage of CSV is US-ASCII, but other character sets as | |||
| defined by IANA for the "text" tree may be used. | defined by IANA for the "text" tree may be used. | |||
| Encoding considerations: | Encoding considerations: | |||
| While section 4.1.1. of RFC 2046 [1] stipulates that "text" | As per section 4.1.1. of RFC 2046 [2], this media type uses CRLF | |||
| subtypes MUST use a CRLF sequence as a line break, in practice | to denote line breaks. However, implementors should be aware that | |||
| that is not always true for CSV. Therefore, implementors should | some implementations may use other values. | |||
| be aware that either CR or CRLF maybe used as a line break for | ||||
| this format. | ||||
| Security considerations: | Security considerations: | |||
| CSV files contain passive text data which should not pose any | CSV files contain passive text data which should not pose any | |||
| risks. However, it is possible in theory that malicious binary | risks. However, it is possible in theory that malicious binary | |||
| data maybe included in order to exploit potential buffer overruns | data maybe included in order to exploit potential buffer overruns | |||
| in the program processing CSV data. Additionally, private data | in the program processing CSV data. Additionally, private data | |||
| maybe shared via this format which of course applies to any text | maybe shared via this format (which of course applies to any text | |||
| data. | data). | |||
| Interoperability considerations: | Interoperability considerations: | |||
| Due to lack of a single specification there are considerable | Due to lack of a single specification there are considerable | |||
| differences among different implementations as described in | differences among different implementations. Implementors should | |||
| appendix A. The most common difference among various format is | "be conservative in what you do, be liberal in what you accept | |||
| whether double quotes (") are used to enclose strings. | from others" (RFC 793 [3]) when processing CSV files. An attempt | |||
| Implementors should "be conservative in what you do, be liberal in | at a common definition can be found in Section 2. | |||
| what you accept from others" (RFC 793 [2]) when processing CSV | ||||
| files. | ||||
| Published specification: | Published specification: | |||
| While numerous private specifications exist for various programs | While numerous private specifications exist for various programs | |||
| and systems, there is no single "master" specification for this | and systems, there is no single "master" specification for this | |||
| format. A sampling of formats and discussion of differences is | format. An attempt at documentating a common definition can be | |||
| included in appendix A. | found in Section 2. | |||
| Applications which use this media type: | Applications which use this media type: | |||
| Spreadsheet programs and various data conversion utilities | Spreadsheet programs and various data conversion utilities | |||
| Additional information: | Additional information: | |||
| Magic number(s): none | Magic number(s): none | |||
| File extension(s): CSV | File extension(s): CSV | |||
| skipping to change at page 4, line 46 | skipping to change at page 6, line 27 | |||
| Macintosh File Type Code(s): TEXT | Macintosh File Type Code(s): TEXT | |||
| Person & email address to contact for further information: | Person & email address to contact for further information: | |||
| Yakov Shafranovich <ietf@shaftek.org> | Yakov Shafranovich <ietf@shaftek.org> | |||
| Intended usage: COMMON | Intended usage: COMMON | |||
| Author/Change controller: IESG | Author/Change controller: IESG | |||
| 3. IANA Considerations | 4. IANA Considerations | |||
| After IESG approval, IANA is expected to register these two types | After IESG approval, IANA is expected to register the MIME type | |||
| "text/csv" and "text/comma-separated-values" using the application | "text/csv" using the application provided in Section 3 of this | |||
| provided in this document. | document. | |||
| 4. Security Considerations | 5. Security Considerations | |||
| See discussion above | See discussion above | |||
| 5. References | 6. Acknowledgments | |||
| 5.1 Normative References | The author would like to thank Dave Crocker, Martin Duerst and Bruce | |||
| Lilly for their helpful suggestions. A special word of thanks to | ||||
| Dave for helping with the ABNF grammar. | ||||
| [1] Freed, N. and N. Borenstein, "Multipurpose Internet Mail | 7. References | |||
| 7.1 Normative References | ||||
| [1] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax | ||||
| Specifications: ABNF", RFC 2234, November 1997. | ||||
| [2] Freed, N. and N. Borenstein, "Multipurpose Internet Mail | ||||
| Extensions (MIME) Part Two: Media Types", RFC 2046, November | Extensions (MIME) Part Two: Media Types", RFC 2046, November | |||
| 1996. | 1996. | |||
| [2] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, | [3] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, | |||
| September 1981. | September 1981. | |||
| [3] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax | 7.2 Informative References | |||
| Specifications: ABNF", RFC 2234, November 1997. | ||||
| 5.2 Informative References | ||||
| [4] Freed, N., Klensin, J. and J. Postel, "Multipurpose Internet | [4] Freed, N., Klensin, J. and J. Postel, "Multipurpose Internet | |||
| Mail Extensions (MIME) Part Four: Registration Procedures", | Mail Extensions (MIME) Part Four: Registration Procedures", | |||
| BCP 13, RFC 2048, November 1996. | BCP 13, RFC 2048, November 1996. | |||
| [5] Repici, J., "HOW-TO: The Comma Separated Value (CSV) File | [5] Repici, J., "HOW-TO: The Comma Separated Value (CSV) File | |||
| Format", 2004, | Format", 2004, | |||
| <http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm>. | <http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm>. | |||
| [6] Edoceo, Inc., "CSV Standard File Format", 2004, | [6] Edoceo, Inc., "CSV Standard File Format", 2004, | |||
| skipping to change at page 6, line 5 | skipping to change at page 7, line 38 | |||
| <http://www.catb.org/~esr/writings/taoup/html/ch05s02.html>. | <http://www.catb.org/~esr/writings/taoup/html/ch05s02.html>. | |||
| Author's Address | Author's Address | |||
| Yakov Shafranovich | Yakov Shafranovich | |||
| SolidMatrix Technologies, Inc. | SolidMatrix Technologies, Inc. | |||
| Email: ietf@shaftek.org | Email: ietf@shaftek.org | |||
| URI: http://www.shaftek.org | URI: http://www.shaftek.org | |||
| Appendix A. Appendix A - Discussion of the CSV format | Appendix A. Status of This Document [To Be Removed Upon Publication] | |||
| While there are various specifications and implementations for the | ||||
| CSV format (for ex. [5], [6], [7] and [8]), no formal specification | ||||
| exists. This causes a wide variety of interpretations for CSV files. | ||||
| While this document does not seek to document the CSV format, | ||||
| nevertheless we want to document the format that seems to be followed | ||||
| by most implementations: | ||||
| 1. Each record is located on a separate line delimited by a line | ||||
| break (either CR or CR/LF). For example: | ||||
| aaa,bbb,ccc CRLF | ||||
| zzz,yyy,xxx CRLF | ||||
| 2. The last record in the file may or may not have an ending | ||||
| linebreak. For example: | ||||
| aaa,bbb,ccc CRLF | ||||
| zzz,yyy,xxx | ||||
| 3. There maybe an optional header line appearing as the first line | ||||
| of the file with the same format as normal record lines. This | ||||
| header will contain names corresponding to the fields in the file | ||||
| and will usually contain the same number of fields as the records | ||||
| in the rest of the file. For example: | ||||
| field_name,field_name,field_name CRLF | ||||
| aaa,bbb,ccc CRLF | ||||
| zzz,yyy,xxx CRLF | ||||
| 4. Within the header and each record there may be one or more | ||||
| fields, delimited by commas. The last field in the record may or | ||||
| may not be followed by a comma. For example: | ||||
| aaa,bbb,ccc | ||||
| 5. Each field may or may not be enclosed in double quotes, however | ||||
| some programs such as Microsoft Excel do not use double quotes at | ||||
| all. For example: | ||||
| "aaa","bbb","ccc" CRLF | ||||
| zzz,yyy,xxx | ||||
| 6. Field containing line breaks (CR or CR/LF) and commas should be | ||||
| enclosed in double-quotes. For example: | ||||
| "aaa","b CRLF | ||||
| bb","ccc" CRLF | ||||
| zzz,yyy,xxx | ||||
| 7. If double-quotes are used to enclosed fields, then double-quotes | ||||
| inside fields must be surounded by double quotes. For example: | ||||
| "aaa","b"""bb","ccc" | ||||
| 8. Whitespace immediately before and after commas maybe removed | ||||
| unless it appears inside double-quotes. For example: | ||||
| zzz, yyy , xxx | ||||
| would be processed as if it was: | A.1 Discussion Venue | |||
| zzz,yyy,xxx | Discussion about this document should be directed to the IETF-TYPES | |||
| mailing list <http://www.alvestrand.no/mailman/listinfo/ietf-types/> | ||||
| which is also reachable via <ietf-types@iana.org>. Of course, | ||||
| comments directly to the author are always welcome. | ||||
| The ABNF grammar [3] appears as follows: | A.2 Document Repository | |||
| COMMA = %x2C | Copies of this and earlier versions including multiple formats can be | |||
| found at <http://www.shaftek.org/publications/drafts/mime-csv/>. | ||||
| file = [header] *record | A.3 Document History | |||
| end-of-field = COMMA / (CR / CRLF) | Changes from draft-shafranovich-mime-csv-00 to | |||
| draft-shafranovich-mime-csv-01: | ||||
| header = *(*WSP field *WSP end-of-field) | o Type "text/comma-separated-values" has been removed | |||
| record = *(*WSP field *WSP end-of-field) | o The "encoding consideration" paragraph of Section 3 has been | |||
| changed to allow CRLF only as per section 4.1.1. of RFC 2046 [2]. | ||||
| This has been reflected in the ABNF grammar in Section 2. | ||||
| field = escaped / non-escaped | o ABNF grammar in Section 2 has been cleaned up. | |||
| escaped = DQUOTE *(VCHAR / CR / CRLF / 3*DQUOTE) DQUOTE | o Acknowledgements and status sections were added. | |||
| non-escaped = *VCHAR | o CSV format definition was moved to the normative section of the | |||
| document | ||||
| Intellectual Property Statement | Intellectual Property Statement | |||
| The IETF takes no position regarding the validity or scope of any | The IETF takes no position regarding the validity or scope of any | |||
| Intellectual Property Rights or other rights that might be claimed to | Intellectual Property Rights or other rights that might be claimed to | |||
| pertain to the implementation or use of the technology described in | pertain to the implementation or use of the technology described in | |||
| this document or the extent to which any license under such rights | this document or the extent to which any license under such rights | |||
| might or might not be available; nor does it represent that it has | might or might not be available; nor does it represent that it has | |||
| made any independent effort to identify any such rights. Information | made any independent effort to identify any such rights. Information | |||
| on the procedures with respect to rights in RFC documents can be | on the procedures with respect to rights in RFC documents can be | |||
| End of changes. | ||||
This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/ | ||||