November 5, 2009

How to Sort a CSV File

Filed under: Automation,Scripting — Marcus Tettmar @ 11:39 am

This morning, forum user hip asked how a script could sort a CSV file. Seems CSV files, sorting and Excel has become a bit of a hot topic here lately. Since Excel can read and save to CSV and also has a Sort function we might as well make use of it. Here’s a little VBScript function I put together which will sort a CSV file by a given column:

VBSTART
Sub SortCSVFile(file,sort_column)
  Set XLApp = CreateObject("Excel.Application")
  XLApp.Workbooks.Open(file)
  
  XLApp.ActiveWorkbook.ActiveSheet.Sort.SortFields.Clear
  Set SortCol = XLApp.Range(sort_column)
  XLApp.ActiveWorkbook.ActiveSheet.Sort.SortFields.Add(SortCol)
  Set Cell = XLApp.ActiveWorkbook.ActiveSheet.Cells.Find("*", XLApp.ActiveWorkbook.ActiveSheet.Cells(1, 1),,,,2)
  Set mYRange = XLApp.Range("A1", XLApp.ActiveWorkBook.ActiveSheet.Cells(Cell.Row, Cell.Column))
    
  XLApp.ActiveWorkbook.ActiveSheet.Sort.SetRange(mYRange)
  XLApp.ActiveWorkbook.ActiveSheet.Sort.Apply
  
  XLApp.ActiveWorkBook.Save
  XLApp.ActiveWorkBook.Close false
  XLApp.quit
End Sub
VBEND

The function opens the CSV file in Excel – without making Excel visible – then performs the sort, saves the file and closes Excel. You won’t see Excel appear on the screen.

Here’s how to use it. If you want to sort your file on column B and don’t have a header row, use:

VBRun>SortCSVFile,C:\Documents\my.csv,B1

To sort on column A and you DO have a header row use:

VBRun>SortCSVFile,C:\Documents\my.csv,A2

UPDATE: Here’s a simplified version which should work in Excel XP/2003 and should be able to detect whether or not your data has a header row:

VBSTART
Sub SortCSVFile(file,sort_column)
  Set XLApp = CreateObject("Excel.Application")
  XLApp.Workbooks.Open(file) 
  XLApp.ActiveWorkbook.ActiveSheet.Range("A1").Sort XLApp.ActiveWorkbook.ActiveSheet.Range(sort_column),,,,,,,0  
  XLApp.ActiveWorkBook.Save
  XLApp.ActiveWorkBook.Close false
  XLApp.quit
End Sub
VBEND

What if you don’t have Microsoft Excel?

As mentioned in my reply on the forum another method you could use is to open the CSV file as a recordset using DBQuery and apply an “ORDER BY” clause to your SQL statement. That would sort it into memory. Then you would need to loop through the array and write the data back out to a CSV file using WriteLn.

Something like:

//new file to write to
Let>new_file=%SCRIPT_DIR%\example_sorted.csv
IfFileExists>new_file
  DeleteFile>new_file
Endif

//set WriteLn NOT to add line breaks (we'll add those ourselves)
Let>WLN_NOCRLF=1
//The CSV delimiter
Let>comma=,

//Connect to and query the CSV using ORDER BY PRICE to sort on Price column
Let>ConStr=Provider=Microsoft.Jet.OLEDB.4.0;Data Source=%SCRIPT_DIR%;Extended Properties='text;HDR=YES;FMT=Delimited'
DBConnect>ConStr,dbH
DBQuery>dbH,select * from example.csv ORDER BY PRICE,fields,num_recs,num_fields

//If we have records, loop through records and fields, writing out to new file
If>num_recs>0
  //first need to copy the header record ...
  ReadLn>%SCRIPT_DIR%\example.csv,1,header_line
  WriteLn>new_file,wlnres,%header_line%%CRLF%

  //the loop
  Let>rec=0
  Repeat>rec
    Let>rec=rec+1
    Let>field=0
    Repeat>field
      Let>field=field+1
      Let>this_field=fields_%rec%_%field%
      //write the field out to the new file
      WriteLn>new_file,wlnres,this_field
      //add the comma delimiter if needed
      If>field<num_fields
        WriteLn>new_file,wlnres,comma
      Endif
    Until>field=num_fields
    //add CRLF at end of line
    WriteLn>new_file,wlnres,%CRLF%
  Until>rec=num_recs
Endif
DBClose>dbH

//if you want, delete the input file and replace with the new sorted one:
//DeleteFile>%SCRIPT_DIR%\example.csv
//RenameFile>%SCRIPT_DIR%\example_sorted.csv,%SCRIPT_DIR%\example.csv