This document is out of date. The most recent version is available here: http://www.rfc-editor.org/rfc/rfc4180.txt



 TOC 
Network Working GroupY. Shafranovich
Internet-DraftSolidMatrix Technologies, Inc.
Expires: August 20, 2005February 16, 2005

Common Format and MIME Type for CSV Files

draft-shafranovich-mime-csv-01.txt

Status of this Memo

This document is an Internet-Draft and is subject to all provisions of Section 3 of RFC 3667. By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she become aware will be disclosed, in accordance with RFC 3668.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

This Internet-Draft will expire on August 20, 2005.

Copyright Notice

Copyright (C) The Internet Society (2005).

Abstract

This document documents the format used for Comma-Separated Values (CSV) files and registers the associated MIME type "text/csv".



Table of Contents

1.  Introduction
2.  Definition of the CSV format
3.  MIME Type Registration of text/csv
4.  IANA Considerations
5.  Security Considerations
6.  Acknowledgments
7.  References
    7.1  Normative References
    7.2  Informative References
§  Author's Address
A.  Status of This Document [To Be Removed Upon Publication]
    A.1  Discussion Venue
    A.2  Document Repository
    A.3  Document History
§  Intellectual Property and Copyright Statements




 TOC 

1. Introduction

The comma separated values format (CSV) has been used for exchanging and converting data between various spreadsheet programs for quite some time. Surprisingly, while this format is very common it has never been formally documented. Additionally, while the IANA MIME registration tree includes a registration for "text/tab-separated-values" type, no MIME types have ever been registered with IANA for CSV. At the same time, various programs and operating systems have begun to use different MIME types for this format, many of which vary from system to system. This document seeks to document the format of comma separated values (CSV) files and to formally register the "text/csv" MIME type for CSV in accordance with RFC 2048Freed, N., Klensin, J. and J. Postel, Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures, November 1996.[4].



 TOC 

2. Definition of the CSV format

While there are various specifications and implementations for the CSV format (for ex. [5]Repici, J., HOW-TO: The Comma Separated Value (CSV) File Format, 2004., [6]Edoceo, Inc., CSV Standard File Format, 2004., [7]Rodger, R. and O. Shanaghy, Documentation for Ricebridge CSV Manager, February 2005. and [8]Raymond, E., The Art of Unix Programming, Chapter 5, September 2003.), no formal specification exists which causes a wide variety of interpretations for CSV files. This section seeks to document the format that seems to be followed by most implementations:

  1. Each record is located on a separate line delimited by a line break (CRLF). For example:

    aaa,bbb,ccc CRLF
    zzz,yyy,xxx CRLF
  2. The last record in the file may or may not have an ending linebreak. For example:

    aaa,bbb,ccc CRLF
    zzz,yyy,xxx
  3. There maybe an optional header line appearing as the first line of the file with the same format as normal record lines. This header will contain names corresponding to the fields in the file and will usually contain the same number of fields as the records in the rest of the file. For example:

    field_name,field_name,field_name CRLF
    aaa,bbb,ccc CRLF
    zzz,yyy,xxx CRLF
  4. Within the header and each record there may be one or more fields, delimited by commas. The last field in the record may or may not be followed by a comma. For example:

    aaa,bbb,ccc
  5. Each field may or may not be enclosed in double quotes (however some programs such as Microsoft Excel do not use double quotes at all). For example:

    "aaa","bbb","ccc" CRLF
    zzz,yyy,xxx
  6. Field containing line breaks (CRLF) and commas should be enclosed in double-quotes. For example:

    "aaa","b CRLF
    bb","ccc" CRLF
    zzz,yyy,xxx
  7. If double-quotes are used to enclosed fields, then double-quotes inside fields must be surounded by double quotes. For example:

    "aaa","b"""bb","ccc"

The ABNF grammarCrocker, D., Ed. and P. Overell, Augmented BNF for Syntax Specifications: ABNF, November 1997.[1] appears as follows:

file = [header CRLF] record *(CRLF record) [CRLF]

header = name *(COMMA name)

record = field *(COMMA field)

name = field

field = (escaped / non-escaped)

escaped = DQUOTE *(VCHAR / CR / LF / CRLF / 3*DQUOTE) DQUOTE

non-escaped = *VCHAR

COMMA = %x2C

CR = %x0D ;as per section 6.1 of RFC 2234Crocker, D., Ed. and P. Overell, Augmented BNF for Syntax Specifications: ABNF, November 1997.[1]

LF = %x0A ;as per section 6.1 of RFC 2234Crocker, D., Ed. and P. Overell, Augmented BNF for Syntax Specifications: ABNF, November 1997.[1]

CRLF = CR LF ;as per section 6.1 of RFC 2234Crocker, D., Ed. and P. Overell, Augmented BNF for Syntax Specifications: ABNF, November 1997.[1]

VCAR = %x21-7E ;as per section 6.1 of RFC 2234Crocker, D., Ed. and P. Overell, Augmented BNF for Syntax Specifications: ABNF, November 1997.[1]



 TOC 

3. MIME Type Registration of text/csv

This section provides the media-type registration application (as per RFC 2048Freed, N., Klensin, J. and J. Postel, Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures, November 1996.[4], which will be submitted to IANA after IESG approval of this document.

To: ietf-types@iana.org

Subject: Registration of MIME media type text/csv

MIME media type name: text

MIME subtype name: csv

Required parameters: none

Optional parameters: charset

Common usage of CSV is US-ASCII, but other character sets as defined by IANA for the "text" tree may be used.

Encoding considerations:

As per section 4.1.1. of RFC 2046Freed, N. and N. Borenstein, Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types, November 1996.[2], this media type uses CRLF to denote line breaks. However, implementors should be aware that some implementations may use other values.

Security considerations:

CSV files contain passive text data which should not pose any risks. However, it is possible in theory that malicious binary data maybe included in order to exploit potential buffer overruns in the program processing CSV data. Additionally, private data maybe shared via this format (which of course applies to any text data).

Interoperability considerations:

Due to lack of a single specification there are considerable differences among different implementations. Implementors should "be conservative in what you do, be liberal in what you accept from others" (RFC 793Postel, J., Transmission Control Protocol, September 1981.[3]) when processing CSV files. An attempt at a common definition can be found in Section 2Definition of the CSV format.

Published specification:

While numerous private specifications exist for various programs and systems, there is no single "master" specification for this format. An attempt at documentating a common definition can be found in Section 2Definition of the CSV format.

Applications which use this media type:

Spreadsheet programs and various data conversion utilities

Additional information:

Magic number(s): none

File extension(s): CSV

Macintosh File Type Code(s): TEXT

Person & email address to contact for further information:

Yakov Shafranovich <ietf@shaftek.org>

Intended usage: COMMON

Author/Change controller: IESG



 TOC 

4. IANA Considerations

After IESG approval, IANA is expected to register the MIME type "text/csv" using the application provided in Section 3MIME Type Registration of text/csv of this document.



 TOC 

5. Security Considerations

See discussion above



 TOC 

6. Acknowledgments

The author would like to thank Dave Crocker, Martin Duerst and Bruce Lilly for their helpful suggestions. A special word of thanks to Dave for helping with the ABNF grammar.



 TOC 

7. References



 TOC 

7.1 Normative References

[1] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997 (TXT, HTML, XML).
[2] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, November 1996.
[3] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981.


 TOC 

7.2 Informative References

[4] Freed, N., Klensin, J. and J. Postel, "Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures", BCP 13, RFC 2048, November 1996 (TXT, HTML, XML).
[5] Repici, J., "HOW-TO: The Comma Separated Value (CSV) File Format", 2004 (HTML).
[6] Edoceo, Inc., "CSV Standard File Format", 2004 (HTML).
[7] Rodger, R. and O. Shanaghy, "Documentation for Ricebridge CSV Manager", February 2005 (HTML).
[8] Raymond, E., "The Art of Unix Programming, Chapter 5", September 2003 (HTML).


 TOC 

Author's Address

  Yakov Shafranovich
  SolidMatrix Technologies, Inc.
Email:  ietf@shaftek.org
URI:  http://www.shaftek.org


 TOC 

Appendix A. Status of This Document [To Be Removed Upon Publication]

A.1 Discussion Venue

Discussion about this document should be directed to the IETF-TYPES mailing list http://www.alvestrand.no/mailman/listinfo/ietf-types/ which is also reachable via ietf-types@iana.org. Of course, comments directly to the author are always welcome.

A.2 Document Repository

Copies of this and earlier versions including multiple formats can be found at http://www.shaftek.org/publications/drafts/mime-csv/.

A.3 Document History

Changes from draft-shafranovich-mime-csv-00 to draft-shafranovich-mime-csv-01:



 TOC 

Intellectual Property Statement

Disclaimer of Validity

Copyright Statement

Acknowledgment