
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM 'rfc2629.dtd']>

<!-- may be omitted for very short documents --> 
<?rfc toc="yes"?>
<?rfc sortrefs="yes"?>

<!-- these two save paper: start new sections from the same page etc. -->
<?rfc compact="yes"?> <?rfc subcompact="no"?>

<!-- other categories: bcp, exp, historic, std -->
<rfc ipr="full3667" category="info" docName="draft-shafranovich-mime-csv-00.txt">

<front>
	<title>MIME Type for CSV Files</title>
	<author fullname="Yakov Shafranovich" initials="Y." surname="Shafranovich">
		<organization>SolidMatrix Technologies, Inc.</organization>
		<address>
			<postal/>
			<email>ietf@shaftek.org</email>
			<uri>http://www.shaftek.org</uri>
		</address>
	</author>
	<date month="February" year="2005"/>
	<keyword>CSV</keyword>
	<keyword>MIME</keyword>
	<keyword>comma separated</keyword>

	<abstract>
		<t>This document defines MIME types "text/csv" and "text/comma-separated-values"  which used for
		Comma-Separated Values (CSV) files.</t>
	</abstract>
</front>

<middle>
<section title="Introduction">
	<t>The comma separated values format (CSV) has been used for exchanging and converting data between various spreadsheet
	programs for quite some time. Surprisingly, while this file is very common it has never been formally documented. Additionally,
	while the IANA MIME registration tree includes a registraton for "text/tab-separated-values" type,
	no MIME types have ever been registered with IANA for CSV. At the same time, various programs and operating systems have begun
	to use different MIME types for this format, many of which vary from system to system. This document
	seeks to formally register two MIME types for CSV in accordance with <xref target="RFC2048">RFC 2048</xref>.
	</t>
</section>

<section title="MIME Type Registration of text/csv and text/comma-separated-values">
	<t>This section provides the media-type registration application (as 
	per <xref target="RFC2048">RFC 2048</xref>, which will be submitted to IANA after IESG 
	approval of this document.</t>

	<t>To: ietf-types@iana.org</t>
	<t>Subject: Registration of MIME media types text/csv and text/comma-separated-values</t>
	<t>MIME media type name: text</t>
	<t>MIME subtype name: csv, comma-separated-values</t>
	<t>Required parameters: none</t>
	<t>Optional parameters: charset	
		<list style="empty">
			<t>Common usage of CSV is US-ASCII, but other character sets as defined by IANA for the "text"
			tree may be used.</t>
		</list>
	</t>
	<t>Encoding considerations:
		<list style="empty">	
			<t>While section 4.1.1. of <xref target="RFC2046">RFC 2046</xref> stipulates that "text" subtypes
			MUST use a CRLF sequence as a line break, in practice that is not always true for CSV. Therefore, implementors
			should be aware that either CR or CRLF maybe used as a line break for this format.
			</t>
		</list>
	</t>
	<t>Security considerations:
		<list style="empty">
			<t>CSV files contain passive text data which should not pose any risks. However, it is possible in theory
			that malicious binary data maybe included in order to exploit potential buffer overruns in the program
			processing CSV data. Additionally, private data maybe shared via this format which of course applies to
			any text data.
			</t>
		</list>
	</t>
	<t>
	Interoperability considerations:
		<list style="empty">
			<t>
			Due to lack of a single specification there are considerable differences among different implementations
			as described in appendix A. The most common difference among various format is whether double quotes (") are used
			to enclose strings. Implementors should "be conservative in what you do, be liberal in what you accept from others"
			(<xref target="RFC0793">RFC 793</xref>) when processing CSV files.
			</t>
		</list>
	</t>
	<t>Published specification:
		<list style="empty">
			<t>While numerous private specifications exist for various programs and systems, there is no single "master"
			specification for this format. A sampling of formats and discussion of differences is included in appendix A.
			</t>
		</list>
	</t>
	<t>Applications which use this media type:
		<list style="empty">
			<t>Spreadsheet programs and various data conversion utilities</t>
		</list>
	</t>
	<t>Additional information:
		<list style="empty">
			<t>Magic number(s): none</t>
			<t>File extension(s): CSV</t>
			<t>Macintosh File Type Code(s): TEXT</t>
		</list>
	</t>
	<t>Person & email address to contact for further information:
		<list style="empty">
			<t>Yakov Shafranovich &lt;ietf@shaftek.org&gt;</t>
		</list>
	</t>
	<t>Intended usage: COMMON</t>
	<t>Author/Change controller: IESG</t>
</section>

<!--
<section title="Acknowledgements">
	<t></t>
</section>
-->

<section title="IANA Considerations">
	<t>After IESG approval, IANA is expected to register these two types "text/csv" and "text/comma-separated-values"
	using the application provided in this document.</t>
</section>

<section title="Security Considerations">
	<t>See discussion above</t>
</section>
</middle>
<back>

<!-- references split to informative and normative -->
<references title="Normative References">
<reference anchor='RFC0793'>

<front>
<title abbrev='Transmission Control Protocol'>Transmission Control Protocol</title>
<author initials='J.' surname='Postel' fullname='Jon Postel'>
<organization>University of Southern California (USC)/Information Sciences Institute</organization>
<address>
<postal>
<street>4676 Admiralty Way</street>
<city>Marina del Rey</city>
<region>CA</region>
<code>90291</code>
<country>US</country></postal></address></author>
<date year='1981' day='1' month='September' /></front>

<seriesInfo name='STD' value='7' />
<seriesInfo name='RFC' value='793' />
<format type='TXT' octets='172710' target='ftp://ftp.isi.edu/in-notes/rfc793.txt' />
</reference>

<reference anchor='RFC2046'>

<front>
<title abbrev='Media Types'>Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types</title>
<author initials='N.' surname='Freed' fullname='Ned Freed'>
<organization>Innosoft International, Inc.</organization>
<address>
<postal>
<street>1050 East Garvey Avenue South</street>
<city>West Covina</city>
<region>CA</region>
<code>91790</code>
<country>US</country></postal>
<phone>+1 818 919 3600</phone>
<facsimile>+1 818 919 3614</facsimile>
<email>ned@innosoft.com</email></address></author>
<author initials='N.' surname='Borenstein' fullname='Nathaniel S. Borenstein'>
<organization>First Virtual Holdings</organization>
<address>
<postal>
<street>25 Washington Avenue</street>
<city>Morristown</city>
<region>NJ</region>
<code>07960</code>
<country>US</country></postal>
<phone>+1 201 540 8967</phone>
<facsimile>+1 201 993 3032</facsimile>
<email>nsb@nsb.fv.com</email></address></author>
<date year='1996' month='November' />
<abstract>
<t>STD 11, RFC 822 defines a message representation protocol specifying considerable detail about US-ASCII message headers, but which leaves the message content, or message body, as flat US-ASCII text.  This set of documents, collectively called the Multipurpose Internet Mail Extensions, or MIME, redefines the format of messages to allow for</t>
<t>(1)   textual message bodies in character sets other than US-ASCII,</t>
<t>(2)   an extensible set of different formats for non-textual message bodies,</t>
<t>(3)   multi-part message bodies, and</t>
<t>(4)   textual header information in character sets other than US-ASCII.</t>
<t>These documents are based on earlier work documented in RFC 934, STD 11 and RFC 1049, but extends and revises them.  Because RFC 822 said so little about message bodies, these documents are largely orthogonal to (rather than a revision of) RFC 822.</t>
<t>The initial document in this set, RFC 2045, specifies the various headers used to describe the structure of MIME messages. This second document defines the general structure of the MIME media typing sytem and defines an initial set of media types. The third document, RFC 2047, describes extensions to RFC 822 to allow non-US-ASCII text data in Internet mail header fields. The fourth document, RFC 2048, specifies various IANA registration procedures for MIME-related facilities.  The fifth and final document, RFC 2049, describes MIME
   conformance criteria as well as providing some illustrative examples of MIME message formats, acknowledgements, and the bibliography.</t>
<t>These documents are revisions of RFCs 1521 and 1522, which themselves were revisions of RFCs 1341 and 1342.  An appendix in RFC 2049 describes differences and changes from previous versions.</t></abstract></front>

<seriesInfo name='RFC' value='2046' />
<format type='TXT' octets='105854' target='ftp://ftp.isi.edu/in-notes/rfc2046.txt' />
</reference>

<reference anchor='RFC2234'>

<front>
<title abbrev='ABNF for Syntax Specifications'>Augmented BNF for Syntax Specifications: ABNF</title>
<author initials='D.' surname='Crocker' fullname='David H. Crocker' role='editor'>
<organization>Internet Mail Consortium</organization>
<address>
<postal>
<street>675 Spruce Dr.</street>
<city>Sunnyvale</city>
<region>CA</region>
<code>94086</code>
<country>US</country></postal>
<phone>+1 408 246 8253</phone>
<facsimile>+1 408 249 6205</facsimile>
<email>dcrocker@imc.org</email></address></author>
<author initials='P.' surname='Overell' fullname='Paul Overell'>
<organization>Demon Internet Ltd.</organization>
<address>
<postal>
<street>Dorking Business Park</street>
<street>Dorking</street>
<city>Surrey</city>
<region>England</region>
<code>RH4 1HN</code>
<country>UK</country></postal>
<email>paulo@turnpike.com</email></address></author>
<date year='1997' month='November' />
<keyword>ABNF</keyword>
<keyword>Augmented</keyword>
<keyword>Backus-Naur</keyword>
<keyword>Form</keyword>
<keyword>electronic</keyword>
<keyword>mail</keyword></front>

<seriesInfo name='RFC' value='2234' />
<format type='TXT' octets='24265' target='ftp://ftp.isi.edu/in-notes/rfc2234.txt' />
<format type='HTML' octets='36695' target='http://xml.resource.org/public/rfc/html/rfc2234.html' />
<format type='XML' octets='24067' target='http://xml.resource.org/public/rfc/xml/rfc2234.xml' />
</reference>

</references>
<references title="Informative References">

<reference anchor='CREATIVYST' target="http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm">
	<front>
		<title>HOW-TO: The Comma Separated Value (CSV) File Format</title>
		<author initials='J.' surname='Repici' fullname='John Repici'>
			<organization>Creativyst, Inc.</organization>
			<address>
				<postal>
					<street>120 Jefferson St.</street>
					<street>Riverside</street>
					<street>NJ 08075</street>
					<country>USA</country>
				</postal>
				<phone>+1-856-764-5099</phone>
				<uri>http://www.creativyst.com</uri>
			</address>
		</author>
		<date year='2004'/>
	</front>
	<format type='HTML' target='http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm' />
</reference>

<reference anchor='EDOCEO' target="http://www.edoceo.com/utilis/csv-file-format.php">
	<front>
		<title>CSV Standard File Format</title>
		<author>
			<organization>Edoceo, Inc.</organization>
			<address>
				<postal>
					<street>620 W. Olympic Pl. #206</street>
					<street>Seatle</street>
					<street>WA 98119</street>
					<country>USA</country>
				</postal>
				<phone>+1-206-378-5619</phone>
				<uri>http://www.edoceo.com</uri>
			</address>
		</author>
		<date year='2004'/>
	</front>
	<format type='HTML' target='http://www.edoceo.com/utilis/csv-file-format.php' />
</reference>

<reference anchor='ART' target="http://www.catb.org/~esr/writings/taoup/html/ch05s02.html">
	<front>
		<title>The Art of Unix Programming, Chapter 5</title>
		<author initials='E.' surname='Raymond' fullname='Eric S. Raymond'>
			<organization>Addison-Wesley</organization>
			<address>
				<postal/>
				<email>esr@thyrsus.com</email>
				<uri>http://www.catb.org/~esr/</uri>
			</address>
		</author>
		<date year='2003' month="September" day="19"/>
	</front>
	<format type='HTML' target='http://www.catb.org/~esr/writings/taoup/html/ch05s02.html'/>
</reference>

<reference anchor='RICE' target="http://www.ricebridge.com/products/csvman/reference.htm">
	<front>
		<title>Documentation for Ricebridge CSV Manager</title>
		<author initials='R.' surname='Rodger' fullname='Richard Rodger'/>
		<author initials='O.' surname='Shanaghy' fullname='Orla Shanaghy'>
			<organization>Ricebridge</organization>
			<address>
				<postal>
					<street>Waterford</street>
					<country>Ireland</country>
				</postal>
			</address>
		</author>
		<date year='2005' month="February"/>
	</front>
	<format type='HTML' target='http://www.ricebridge.com/products/csvman/reference.htm' />
</reference>

<reference anchor='RFC2048'>
<front>
<title abbrev='MIME Registration Procedures'>Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures</title>
<author initials='N.' surname='Freed' fullname='Ned Freed'>
<organization>Innosoft International, Inc.</organization>
<address>
<postal>
<street>1050 East Garvey Avenue South</street>
<street>West Covina</street>
<street>CA 91790</street>
<country>USA</country></postal>
<phone>+1 818 919 3600</phone>
<facsimile>+1 818 919 3614</facsimile>
<email>ned@innosoft.com</email></address></author>
<author initials='J.' surname='Klensin' fullname='John Klensin'>
<organization>MCI</organization>
<address>
<postal>
<street>2100 Reston Parkway</street>
<street>Reston</street>
<street>VA 22091</street></postal>
<phone>+1 703 715-7361</phone>
<facsimile>+1 703 715-7436</facsimile>
<email>klensin@mci.net</email></address></author>
<author initials='J.' surname='Postel' fullname='Jon Postel'>
<organization>USC/Information Sciences Institute</organization>
<address>
<postal>
<street>4676 Admiralty Way</street>
<street>Marina del Rey</street>
<street>CA  90292</street>
<country>USA</country></postal>
<phone>+1 310 822 1511</phone>
<facsimile>+1 310 823 6714</facsimile>
<email>Postel@ISI.EDU</email></address></author>
<date year='1996' month='November' />
<area>Applications</area>
<keyword>mail</keyword>
<keyword>media type</keyword>
<keyword>multipurpose internet mail extensions</keyword>
<abstract>
<t>
   STD 11, RFC 822, defines a message representation protocol specifying
   considerable detail about US-ASCII message headers, and leaves the
   message content, or message body, as flat US-ASCII text.  This set of
   documents, collectively called the Multipurpose Internet Mail
   Extensions, or MIME, redefines the format of messages to allow for

<list>
<t>
    (1)   textual message bodies in character sets other than
          US-ASCII,
</t>
<t>
    (2)   an extensible set of different formats for non-textual
          message bodies,
</t>
<t>
    (3)   multi-part message bodies, and
</t>
<t>
    (4)   textual header information in character sets other than
          US-ASCII.
</t></list></t>
<t>
   These documents are based on earlier work documented in RFC 934, STD
   11, and RFC 1049, but extends and revises them.  Because RFC 822 said
   so little about message bodies, these documents are largely
   orthogonal to (rather than a revision of) RFC 822.
   This fourth document, RFC 2048, specifies various IANA registration
   procedures for the following MIME facilities:

<list>
<t>
    (1)   media types,
</t>
<t>
    (2)   external body access types,
</t>
<t>
    (3)   content-transfer-encodings.
</t></list></t>
<t>
   Registration of character sets for use in MIME is covered elsewhere
   and is no longer addressed by this document.
</t>
<t>
   These documents are revisions of RFCs 1521 and 1522, which themselves
   were revisions of RFCs 1341 and 1342.  An appendix in RFC 2049
   describes differences and changes from previous versions.
</t></abstract></front>

<seriesInfo name='BCP' value='13' />
<seriesInfo name='RFC' value='2048' />
<format type='TXT' octets='45033' target='ftp://ftp.isi.edu/in-notes/rfc2048.txt' />
<format type='HTML' octets='54732' target='http://xml.resource.org/public/rfc/html/rfc2048.html' />
<format type='XML' octets='43342' target='http://xml.resource.org/public/rfc/xml/rfc2048.xml' />
</reference></references>

<section anchor="appendix" title="Appendix A - Discussion of the CSV format">
	<t>While there are various specifications and implementations for the CSV format (for ex. <xref target="CREATIVYST"/>,
	<xref target="EDOCEO"/>, <xref target="RICE"/> and <xref target="ART"/>), no formal specification exists. This causes
	a wide variety of interpretations for CSV files. While this document does not seek to document the CSV format, 
	nevertheless we want to document the format that seems to be followed by most implementations:
		<list style="numbers">
			<t>Each record is located on a separate line delimited by a line break (either CR or CR/LF). For example:
				<vspace blankLines='1'/>
				aaa,bbb,ccc CRLF<vspace blankLines='0'/>	
				zzz,yyy,xxx CRLF
			</t>
			<t>The last record in the file may or may not have an ending linebreak. For example:
				<vspace blankLines='1'/>
				aaa,bbb,ccc CRLF<vspace blankLines='0'/>	
				zzz,yyy,xxx
			</t>
			<t>There maybe an optional header line appearing as the first line of the file with the same format
			as normal record lines. This header will contain names corresponding to the fields in the file
			and will usually contain the same number of fields as the records in the rest of the file. For example:
				<vspace blankLines='1'/>
				field_name,field_name,field_name CRLF<vspace blankLines='0'/>
				aaa,bbb,ccc CRLF<vspace blankLines='0'/>
				zzz,yyy,xxx CRLF
			</t>
			<t>Within the header and each record there may be one or more fields, delimited by commas. The last
			field in the record may or may not be followed by a comma. For example:
				<vspace blankLines='1'/>
				aaa,bbb,ccc
			</t>
			<t>Each field may or may not be enclosed in double quotes, however some programs such as Microsoft Excel
			do not use double quotes at all. For example:
				<vspace blankLines='1'/>
				"aaa","bbb","ccc" CRLF<vspace blankLines='0'/>	
				zzz,yyy,xxx
			</t>
			<t>Field containing line breaks (CR or CR/LF) and commas should be enclosed in double-quotes. For example:
				<vspace blankLines='1'/>
				"aaa","b CRLF<vspace blankLines='0'/>	
				bb","ccc" CRLF<vspace blankLines='0'/>	
				zzz,yyy,xxx
			</t>
			<t>If double-quotes are used to enclosed fields, then double-quotes inside fields
			must be surounded by double quotes. For example:
				<vspace blankLines='1'/>
				"aaa","b"""bb","ccc"
			</t>
			<t>Whitespace immediately before and after commas maybe removed unless it appears inside double-quotes.
			For example:
				<vspace blankLines='1'/>
				zzz,   yyy   ,   xxx
				<vspace blankLines='1'/>
				would be processed as if it was:
				<vspace blankLines='1'/>
				zzz,yyy,xxx
			</t>
		</list>
	</t>
	<t>The <xref target="RFC2234">ABNF grammar</xref> appears as follows:
		<list style="empty">
			<t>COMMA = %x2C</t>
			<t>file = [header] *record</t>
			<t>end-of-field = COMMA / (CR / CRLF)</t>
			<t>header = *(*WSP field *WSP end-of-field)</t>
			<t>record = *(*WSP field *WSP end-of-field)</t>
			<t>field = escaped / non-escaped</t>
			<t>escaped = DQUOTE *(VCHAR / CR / CRLF / 3*DQUOTE) DQUOTE</t>
			<t>non-escaped = *VCHAR</t>
		</list>
	</t>
</section>
</back>

</rfc>

