October 5, 2009

More on Reading and Writing CSV Files

Filed under: Scripting — Marcus Tettmar @ 1:35 pm

More on Reading from CSV Files

In Reading from CSV Files I showed you how you could read an entire CSV file into an array of rows and columns. It used a SELECT * SQL statement to retrieve the entire table.

If you just want to read in a specific set of data you could use a more selective SQL statement. But before we can do that we need to make a slight change to the connection string. The connection string used in my last article includes the text HDR=NO which tells the database driver that the CSV file does not have a header record. If we want to be selective in what data we wish to retrieve we would need to know how to reference that data. So we would need a header record. The following example CSV file starts with a header record:

forename,age,fav_color
“sally”,28,blue
“fred”,32,green
“john”,28,yellow

We now have a way to identify each column of data. We can now change our connection string to use HDR=YES. So our code to connect to the database becomes:

Let>ConStr=Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\My Documents;Extended Properties='text;HDR=YES;FMT=Delimited'
DBConnect>ConStr,dbH

We could now create a selective query. For example to select all people who are 28 years old we would use the following SQL:

SELECT * FROM csvfile.csv WHERE age=28

That would retrieve all fields. We may just want to retrieve their names:

SELECT forename FROM csvfile.csv WHERE age=28

In Macro Scheduler the code to do this is:

Let>SQL=SELECT forename FROM csvfile.csv WHERE age=28
DBQuery>dbH,SQL,fields,num_recs,num_fields

We then get an array called fields containing the results. See Reading from CSV files for example code that loops through the results.

Writing to CSV Files

As I mentioned in Reading from CSV Files, CSV files are just text files, so you could just use the WriteLn function to add a record of data to the end of the file:

WriteLn>”sally”,28,”blue”

But if you only want to add specific fields to the end of the line and want to save yourself the bother of having to correctly delimit the text you might want to treat the CSV file as a database table and use Macro Scheduler’s database functions instead.

In Reading from CSV Files we looked at using an ADO connection string and DBQuery to connect to the CSV file as if it were a database table and then using SQL to pull out the data. Well, we can do the same thing and use SQL “Update” and “Insert” queries in order to modify and insert data into the CSV file.

Here’s the code we used to connect to the CSV file as a data source:

Let>ConStr=Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\My Documents;Extended Properties='text;HDR=NO;FMT=Delimited'
DBConnect>ConStr,dbH

To insert a row of data to the end of the file we can use the SQL INSERT statement as follows:

INSERT INTO csvfile.csv VALUES (“george”,19,”purple”)

In Macro Scheduler we need to use the DBExec command to execute this SQL:

Let>SQL=INSERT INTO csvfile.csv VALUES ("george",19,"purple")
DBExec>dbH,SQL,res

If you look at the connection string we used you’ll see the directive HDR=NO. This tells the database driver that the CSV file does NOT contain a header record. If we DO have a header record we should change this to HDR=YES. This also means we can specify individual column names in the INSERT statement. E.g. we may have the following CSV file:

forename,age,fav_color
“sally”,28,blue
“fred”,32,green

So now we can insert data for only two fields and because we have HDR=YES we can determine which of those fields should be set:

Let>ConStr=Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\My Documents;Extended Properties='text;HDR=NO;FMT=Delimited'
DBConnect>ConStr,dbH
Let>SQL=INSERT INTO csvfile.csv (forename, fav_color) VALUES ("george","purple")
DBExec>dbh

Unfortunately it would appear that it is not possible to perform DELETE or UPDATE queries on CSV files. I’ve spent a few hours trying to find a definitive reference for JET’s text file driver but cannot find anything. All I know is that when I tried a DELETE or UPDATE query nothing happened. This is a shame as this possibility would be extremely powerful. As it is there’s not really much of a benefit over using DBExec instead of WriteLn.

If anyone knows of a definitive reference or a way to be able to perform DELETE or UPDATE queries using the Microsoft CSV driver, please let me know.