CSV File Format - 43 years on and it's still hard?

January 18, 2011 21:46 by Matt Whitfield

This is one issue that really bugs me. CSV files and support of them. It's a fairly simple concept - you have a file which contains tabular data in a grid, and each column is separated by a comma. Sounds simple, right? Well, it is. And it isn't.

CSV files are great as long as you don't have any data in them that contain commas or new line characters - at which point it tends to go a bit pear shaped in terms of universal support.

Let's look at a few of the extended rules around CSV that so many applications get wrong:

Fields with embedded commas must be delimited with double-quote characters

This one is a fairly obvious one. If you have three values - for example 'Hello', 'sir, how', 'are you?' - then you should have three columns in CSV. However, just dumping out the contents leaves you with four:

Hello,sir,how,are you?

So, fields that contain embedded commas have to be delimited:

Hello,"sir,how",are you?

That's nice, but now assigns special meaning to the double-quote character. So, if fields contain a double quote character - we need to follow another rule:

Fields with embedded double-quotes must escape the contained double-quotes if they are delimited

What does that actually mean?

Imagine the value 'test,"testing",test'. We would need to delimit that, because it contains commas. However, the double quote characters would then take on special meaning, and therefore we need to escape them. Escaping them just involves replacing a single double-quote with two:

"test,""testing"",test"

So we're all good - right? Again, almost. CSV records are generally one record per line - but what about fields that contain multiple lines of text? Again, they need to be delimited in the same way, and this can mean that a single CSV data record can span multiple lines. This leads us to our third really important rule:

Fields that span multiple lines must be delimited

Imagine the phrase 'testy\ntest' (i.e. a new line between testy and test). That would be:

"testy
test"

Now we are all good. Properly.

Was that really so hard?

Next time you use a standard piece of software (*cough* SSMS *cough*) try exporting CSV and see what happens...

Tags: ,

compatibility

Comments

1/18/2011 10:07:19 PM #

Thomas Rushton

Nice to know I'm not the only one still having problems with CSVs... thelonedba.wordpress.com/.../

Thomas Rushton United Kingdom

1/19/2011 4:31:08 PM #

Arthur

How true!Good to know: to help wrestle with the CSV files you can call FileHelpers - http://www.filehelpers.com/

Arthur United States

Add comment



(Will show your Gravatar icon)



  Country flag

biuquote
  • Comment
  • Preview
Loading



CAPTCHA image




Data Surf

Tag cloud

Calendar

<<  October 2014  >>
MoTuWeThFrSaSu
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789

View posts in large calendar