March 12, 2009

Reading from CSV Files

Filed under: Scripting — Marcus Tettmar @ 11:47 am

An interesting discussion arose in the forums yesterday on how to read data from a CSV file. The usual suggestion is to use Separate to split each line into an array using the comma as the delimiter, as explained in my post here.

The problem, as highlighted in the forum topic, is: what if the line has a field which itself contains a comma? In CSV if a field contains a comma it will be encased in double quotes. E.g.:

Sally,”2,500″,Blue

If we just use Separate using the comma as a delimiter we will end up with the following values:

Sally
“2
500”
Blue

Of course this isn’t what we want. We want to end up with:

Sally
2,500
Blue

In the forum discussion a number of suggestions were made such as to use regular expressions to find the fields inside quotes and replace the commas with something else, then use Separate and then put the commas back. Certainly we could resort to some kind of text parsing and I’ve updated the post with a RegEx solution.

However, we can avoid all this by using DBQuery and an ADO connection string that lets us connect to and query a CSV file and treat it like a database table. All the work is done for us:

Let>ConStr=Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\My Documents;Extended Properties='text;HDR=NO;FMT=Delimited'
DBConnect>ConStr,dbH
DBQuery>dbH,select * from test.csv,fields,num_recs,num_fields

In the connection string specify the path of the CSV file in Data Source. We can also tell it whether or not the CSV file has a header line by setting HDR to NO or Yes.

This reads all the data from the CSV file into an array.

Here’s some code to read through the array and display each field one by one:

Let>rec=0
Repeat>rec
  Let>rec=rec+1
  Let>field=0
  Repeat>field
    Let>field=field+1
    Let>this_field=fields_%rec%_%field%
    MessageModal>this_field
  Until>field=num_fields
Until>rec=num_recs

If the CSV file has a header line change HDR to Yes and then we could also tell DBQuery to return the field names:

Let>ConStr=Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\My Documents;Extended Properties='text;HDR=NO;FMT=Delimited'
DBConnect>ConStr,dbH
DBQuery>dbH,select * from test.csv,fields,num_recs,num_fields,1

Then we can access the data using the field names. E.g.:

Let>rec=0
Repeat>rec
  Let>rec=rec+1
  Let>name=fields_%rec%_name
  Let>number=fields_%rec%_number
  Let>colour=fields_%rec%_colour
  MessageModal>%name% %number% %colour%
Until>rec=num_recs

For more help with Macro Schedulers database functions see:
Using Macro Scheduler’s Database Functions