CSV File Format - 43 years on and it's still hard?

January 18, 2011 21:46 by Matt Whitfield

This is one issue that really bugs me. CSV files and support of them. It's a fairly simple concept - you have a file which contains tabular data in a grid, and each column is separated by a comma. Sounds simple, right? Well, it is. And it isn't.

CSV files are great as long as you don't have any data in them that contain commas or new line characters - at which point it tends to go a bit pear shaped in terms of universal support.

Let's look at a few of the extended rules around CSV that so many applications get wrong:

Fields with embedded commas must be delimited with double-quote characters

This one is a fairly obvious one. If you have three values - for example 'Hello', 'sir, how', 'are you?' - then you should have three columns in CSV. However, just dumping out the contents leaves you with four:

Hello,sir,how,are you?

So, fields that contain embedded commas have to be delimited:

Hello,"sir,how",are you?

That's nice, but now assigns special meaning to the double-quote character. So, if fields contain a double quote character - we need to follow another rule:

Fields with embedded double-quotes must escape the contained double-quotes if they are delimited

What does that actually mean?

Imagine the value 'test,"testing",test'. We would need to delimit that, because it contains commas. However, the double quote characters would then take on special meaning, and therefore we need to escape them. Escaping them just involves replacing a single double-quote with two:

"test,""testing"",test"

So we're all good - right? Again, almost. CSV records are generally one record per line - but what about fields that contain multiple lines of text? Again, they need to be delimited in the same way, and this can mean that a single CSV data record can span multiple lines. This leads us to our third really important rule:

Fields that span multiple lines must be delimited

Imagine the phrase 'testy\ntest' (i.e. a new line between testy and test). That would be:

"testy
test"

Now we are all good. Properly.

Was that really so hard?

Next time you use a standard piece of software (*cough* SSMS *cough*) try exporting CSV and see what happens...

Tags: ,

compatibility

Comments

1/18/2011 10:07:19 PM #

Thomas Rushton

Nice to know I'm not the only one still having problems with CSVs... thelonedba.wordpress.com/.../

Thomas Rushton United Kingdom

1/19/2011 4:31:08 PM #

Arthur

How true!Good to know: to help wrestle with the CSV files you can call FileHelpers - http://www.filehelpers.com/

Arthur United States

Add comment



(Will show your Gravatar icon)



  Country flag

biuquote
  • Comment
  • Preview
Loading



CAPTCHA image




Schema Inspector

Tag cloud

Calendar

<<  September 2014  >>
MoTuWeThFrSaSu
25262728293031
1234567
891011121314
15161718192021
22232425262728
293012345

View posts in large calendar