November 5, 2009

How to Sort a CSV File

Filed under: Automation,Scripting — Marcus Tettmar @ 11:39 am

This morning, forum user hip asked how a script could sort a CSV file. Seems CSV files, sorting and Excel has become a bit of a hot topic here lately. Since Excel can read and save to CSV and also has a Sort function we might as well make use of it. Here’s a little VBScript function I put together which will sort a CSV file by a given column:

VBSTART
Sub SortCSVFile(file,sort_column)
  Set XLApp = CreateObject("Excel.Application")
  XLApp.Workbooks.Open(file)
  
  XLApp.ActiveWorkbook.ActiveSheet.Sort.SortFields.Clear
  Set SortCol = XLApp.Range(sort_column)
  XLApp.ActiveWorkbook.ActiveSheet.Sort.SortFields.Add(SortCol)
  Set Cell = XLApp.ActiveWorkbook.ActiveSheet.Cells.Find("*", XLApp.ActiveWorkbook.ActiveSheet.Cells(1, 1),,,,2)
  Set mYRange = XLApp.Range("A1", XLApp.ActiveWorkBook.ActiveSheet.Cells(Cell.Row, Cell.Column))
    
  XLApp.ActiveWorkbook.ActiveSheet.Sort.SetRange(mYRange)
  XLApp.ActiveWorkbook.ActiveSheet.Sort.Apply
  
  XLApp.ActiveWorkBook.Save
  XLApp.ActiveWorkBook.Close false
  XLApp.quit
End Sub
VBEND

The function opens the CSV file in Excel – without making Excel visible – then performs the sort, saves the file and closes Excel. You won’t see Excel appear on the screen.

Here’s how to use it. If you want to sort your file on column B and don’t have a header row, use:

VBRun>SortCSVFile,C:\Documents\my.csv,B1

To sort on column A and you DO have a header row use:

VBRun>SortCSVFile,C:\Documents\my.csv,A2

UPDATE: Here’s a simplified version which should work in Excel XP/2003 and should be able to detect whether or not your data has a header row:

VBSTART
Sub SortCSVFile(file,sort_column)
  Set XLApp = CreateObject("Excel.Application")
  XLApp.Workbooks.Open(file) 
  XLApp.ActiveWorkbook.ActiveSheet.Range("A1").Sort XLApp.ActiveWorkbook.ActiveSheet.Range(sort_column),,,,,,,0  
  XLApp.ActiveWorkBook.Save
  XLApp.ActiveWorkBook.Close false
  XLApp.quit
End Sub
VBEND

What if you don’t have Microsoft Excel?

As mentioned in my reply on the forum another method you could use is to open the CSV file as a recordset using DBQuery and apply an “ORDER BY” clause to your SQL statement. That would sort it into memory. Then you would need to loop through the array and write the data back out to a CSV file using WriteLn.

Something like:

//new file to write to
Let>new_file=%SCRIPT_DIR%\example_sorted.csv
IfFileExists>new_file
  DeleteFile>new_file
Endif

//set WriteLn NOT to add line breaks (we'll add those ourselves)
Let>WLN_NOCRLF=1
//The CSV delimiter
Let>comma=,

//Connect to and query the CSV using ORDER BY PRICE to sort on Price column
Let>ConStr=Provider=Microsoft.Jet.OLEDB.4.0;Data Source=%SCRIPT_DIR%;Extended Properties='text;HDR=YES;FMT=Delimited'
DBConnect>ConStr,dbH
DBQuery>dbH,select * from example.csv ORDER BY PRICE,fields,num_recs,num_fields

//If we have records, loop through records and fields, writing out to new file
If>num_recs>0
  //first need to copy the header record ...
  ReadLn>%SCRIPT_DIR%\example.csv,1,header_line
  WriteLn>new_file,wlnres,%header_line%%CRLF%

  //the loop
  Let>rec=0
  Repeat>rec
    Let>rec=rec+1
    Let>field=0
    Repeat>field
      Let>field=field+1
      Let>this_field=fields_%rec%_%field%
      //write the field out to the new file
      WriteLn>new_file,wlnres,this_field
      //add the comma delimiter if needed
      If>field<num_fields
        WriteLn>new_file,wlnres,comma
      Endif
    Until>field=num_fields
    //add CRLF at end of line
    WriteLn>new_file,wlnres,%CRLF%
  Until>rec=num_recs
Endif
DBClose>dbH

//if you want, delete the input file and replace with the new sorted one:
//DeleteFile>%SCRIPT_DIR%\example.csv
//RenameFile>%SCRIPT_DIR%\example_sorted.csv,%SCRIPT_DIR%\example.csv

October 30, 2009

Convert XML to CSV

Filed under: Automation,Scripting — Marcus Tettmar @ 12:27 pm

Earlier this month I wrote about how you can read and write data from/to CSV files as if they were database tables using DBQuery. Andrew asked if I knew of a way that XML files could be treated as database tables in the same way. I did find a couple of references to some ODBC drivers online but one was not yet available and the other was part of a larger commercial package so I was unable to try them.

Since then it has occurred to me that there is a way we can convert XML files to CSV (or XLS) files using Excel. Once the data is in CSV file format we can then retrieve it easily using DBQuery.

The more technically aware will probably ask why we can’t just use the MSXML.DomDocument object in VBScript to read in the XML and parse it. Well, while that is of course possible, many will find the ability to access the XML data in a tabular database-like format easier and more familiar.

So, without further ado, here’s my ConvertXMLtoCSV function:

VBSTART
Sub ConvertXMLtoCSV(xmlFile,csvFile)
    Set xlApp = CreateObject("Excel.Application")
    Set xlBook = xlApp.Workbooks.OpenXML(xmlFile,2)
    xlBook.SaveAs csvFile, 6
    xlBook.close false
    xlApp.quit
End Sub
VBEND

You will need Excel installed for this to work. Use the function like this:

VBRun>ConvertXMLtoCSV,c:\docs\my.xml,c:\docs\converted.csv

And now you can quickly read the data into an array using DBQuery.

The function works quietly in the background. You won’t see Excel open – just a short delay while the CSV file is created.

To convert to an Excel file rather than a CSV file change the code to:

VBSTART
Sub ConvertXMLtoXLS(xmlFile,xlsFile)
    Set xlApp = CreateObject("Excel.Application")
    Set xlBook = xlApp.Workbooks.OpenXML(xmlFile,2)
    xlBook.SaveAs xlsFile, 56
    xlBook.close false
    xlApp.quit
End Sub
VBEND

Please note I’m using Excel 2007 and haven’t tried this with earlier versions though I believe Excel 2003 can also read from XML files. Success may also depend on the format of your XML.

See:
Reading From CSV Files
More on Reading and Writing CSV Files
Retrieve Entire Excel Sheet Using DBQuery

October 28, 2009

Randomly Shuffle an Array

Filed under: Scripting — Marcus Tettmar @ 9:00 am

I was recently asked how to get a list of files from a folder, and then randomly shuffle that list. The following code retrieves the list of files to an array using GetFileList and Separate and then loops through the array shuffling it randomly:

GetFileList>c:\docs\*.*,file_list
Separate>file_list,;,files_array
If>files_array_count>0
  //randomize the array
  Let>k=0
  Repeat>k
    Let>k=k+1
    Random>{%files_array_count%-1},random_index
    Let>random_index=random_index+1
    If>random_index<>k
        Let>Temp=files_array_%k%
        Let>files_array_%k%=files_array_%random_index%
        Let>files_array_%random_index%=Temp
    Endif
  Until>k=files_array_count
Endif

Edit: As pointed out in the comments this is somewhat naive and could cause an element to be swapped back to its original position. A better method is given in the comments.

October 5, 2009

More on Reading and Writing CSV Files

Filed under: Scripting — Marcus Tettmar @ 1:35 pm

More on Reading from CSV Files

In Reading from CSV Files I showed you how you could read an entire CSV file into an array of rows and columns. It used a SELECT * SQL statement to retrieve the entire table.

If you just want to read in a specific set of data you could use a more selective SQL statement. But before we can do that we need to make a slight change to the connection string. The connection string used in my last article includes the text HDR=NO which tells the database driver that the CSV file does not have a header record. If we want to be selective in what data we wish to retrieve we would need to know how to reference that data. So we would need a header record. The following example CSV file starts with a header record:

forename,age,fav_color
“sally”,28,blue
“fred”,32,green
“john”,28,yellow

We now have a way to identify each column of data. We can now change our connection string to use HDR=YES. So our code to connect to the database becomes:

Let>ConStr=Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\My Documents;Extended Properties='text;HDR=YES;FMT=Delimited'
DBConnect>ConStr,dbH

We could now create a selective query. For example to select all people who are 28 years old we would use the following SQL:

SELECT * FROM csvfile.csv WHERE age=28

That would retrieve all fields. We may just want to retrieve their names:

SELECT forename FROM csvfile.csv WHERE age=28

In Macro Scheduler the code to do this is:

Let>SQL=SELECT forename FROM csvfile.csv WHERE age=28
DBQuery>dbH,SQL,fields,num_recs,num_fields

We then get an array called fields containing the results. See Reading from CSV files for example code that loops through the results.

Writing to CSV Files

As I mentioned in Reading from CSV Files, CSV files are just text files, so you could just use the WriteLn function to add a record of data to the end of the file:

WriteLn>”sally”,28,”blue”

But if you only want to add specific fields to the end of the line and want to save yourself the bother of having to correctly delimit the text you might want to treat the CSV file as a database table and use Macro Scheduler’s database functions instead.

In Reading from CSV Files we looked at using an ADO connection string and DBQuery to connect to the CSV file as if it were a database table and then using SQL to pull out the data. Well, we can do the same thing and use SQL “Update” and “Insert” queries in order to modify and insert data into the CSV file.

Here’s the code we used to connect to the CSV file as a data source:

Let>ConStr=Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\My Documents;Extended Properties='text;HDR=NO;FMT=Delimited'
DBConnect>ConStr,dbH

To insert a row of data to the end of the file we can use the SQL INSERT statement as follows:

INSERT INTO csvfile.csv VALUES (“george”,19,”purple”)

In Macro Scheduler we need to use the DBExec command to execute this SQL:

Let>SQL=INSERT INTO csvfile.csv VALUES ("george",19,"purple")
DBExec>dbH,SQL,res

If you look at the connection string we used you’ll see the directive HDR=NO. This tells the database driver that the CSV file does NOT contain a header record. If we DO have a header record we should change this to HDR=YES. This also means we can specify individual column names in the INSERT statement. E.g. we may have the following CSV file:

forename,age,fav_color
“sally”,28,blue
“fred”,32,green

So now we can insert data for only two fields and because we have HDR=YES we can determine which of those fields should be set:

Let>ConStr=Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\My Documents;Extended Properties='text;HDR=NO;FMT=Delimited'
DBConnect>ConStr,dbH
Let>SQL=INSERT INTO csvfile.csv (forename, fav_color) VALUES ("george","purple")
DBExec>dbh

Unfortunately it would appear that it is not possible to perform DELETE or UPDATE queries on CSV files. I’ve spent a few hours trying to find a definitive reference for JET’s text file driver but cannot find anything. All I know is that when I tried a DELETE or UPDATE query nothing happened. This is a shame as this possibility would be extremely powerful. As it is there’s not really much of a benefit over using DBExec instead of WriteLn.

If anyone knows of a definitive reference or a way to be able to perform DELETE or UPDATE queries using the Microsoft CSV driver, please let me know.

September 9, 2009

Including Often Used Library Files

Filed under: Scripting — Marcus Tettmar @ 3:49 pm

If you create lots of Macro Scheduler macros you probably have pieces of code you use frequently. You might have blocks of VBScript or subroutines you often use in your macros. Copying and pasting code like that into your scripts is time consuming and also means if you ever need to change that code you’ll end up having to edit all your macros. A better way to use such “library code” is to Include it using the, yes, Include function:

Include>%SCRIPT_DIR%\MyLib.scp

This function includes the content of MyLib.scp in your macro.

Some people seem confused about how this works. It’s simpler than you might realise. It literally sucks in the code in place of the Include line and then executes it. When the script runs that Include line is actually replaced with the code in MyLib.scp. So the code in MyLib.scp actually becomes part of the overall macro.

One way to visualise how it works is to step through an Inlude line in the debugger. You’ll see the debugger pull the code into the script. Here’s a video to show you what I mean:

In the above example I’ve kept the include file small so that the entire script still fits in the editor. You’ll see the code jump into the editor and then disappear when the script ends or is stopped.

With Include you can reference commonly used functions and code. Some projects may involve lots of scripts that all use some core functions which you would put into an Include file.

Include is different to the Macro command which just “calls” another script file. With the Macro command that other script is run in isolation, almost like another process. The code isn’t shared between the two scripts. Include doesn’t just run the code, it pulls the code into the current macro so that it becomes part of it and is executed at that point. It remains part of the script until the script terminates. So the code could be subroutines or dialog blocks or VBScript blocks or anything you need the main script to reference.

August 28, 2009

Determine IE Version and Automate IE Dialogs

Filed under: Automation,Scripting — Marcus Tettmar @ 2:25 pm

If you are automating web pages or web applications you may at some point need to automate dialog boxes produced by client side script, like the one that pops up when you click the link below:

Confirm Popup

So you might want to be aware that the title of these popups varies between different versions of Internet Explorer. In IE6 it was “Microsoft Internet Explorer”. In IE7 it is “Windows Internet Explorer” and in IE8 it is now “Message from webpage”.

I much prefer the last one from an automation point of view since it differs significantly from the title of the main IE window which usually contains the phrase “Microsoft Internet Explorer” in IE7 and “Windows Internet Explorer” in IE8.

So if we wanted to click on the Yes button if we were writing a script for IE8 we could use the following code:

SetFocus>Message from webpage
Press Enter

But what if we didn’t know what version of IE our script would be automating? If you were giving your script to someone else or putting it on some other PC you may not know what version of IE was being used. What we need to do is make the script find out what version of IE is installed on the computer.

Here’s how to make the script determine which version of IE is in use:

RegistryReadKey>HKEY_LOCAL_MACHINE,SOFTWARE\Microsoft\Internet Explorer,Version,IEver
Separate>IEVER,.,bits
Let>IEVER=bits_1

IEVER now contains the major version number of Internet Explorer.

So we could now use this to define the title we expect from the dialog box and confirm it:

RegistryReadKey>HKEY_LOCAL_MACHINE,SOFTWARE\Microsoft\Internet Explorer,Version,IEver
Separate>IEVER,.,bits
Let>IEVER=bits_1

Let>dialogTitle_IE6=Microsoft Internet Explorer
Let>dialogTitle_IE7=Windows Internet Explorer
Let>dialogTitle_IE8=Message from webpage

SetFocus>dialogTitle_IE%IEVER%
Press Enter

Of course this is all assuming you’re using the English version of Internet Explorer. I cannot vouch for other languages.

August 25, 2009

The Power of DOS: Looping Through Subfolders

Filed under: Automation,Scripting — Marcus Tettmar @ 7:37 pm

Twice today, for two different people, I needed to write some code to iterate through files in a tree of subfolders. The subfolder structure could not be known up front and there may be any number of subfolders and any number of folders deep.

One could use GetDirList iteratively, or use VBScript’s FileSystem Object. But then I was reminded of the power of DOS by JRL in this post.

The following returns a file containing a list of the jpg files throughout all folders, starting in c:\root_folder\ with full paths:

dir c:\root_folder\*.jpg /s /b > outputfile.txt

So, as JRL suggests, we can call that command from Macro Scheduler and then read in the output file. We should use a temporary file:

Let>RP_WINDOWMODE=2
Let>RP_WAIT=1
Run>cmd /c dir c:\root_folder\*.jpg /s /b > %TEMP_DIR%~temp_dir_list~
ReadFile>%TEMP_DIR%~temp_dir_list~,dir_list

Now we can explode that list into an array and loop through it, doing whatever it is we need to do to each file:

Separate>dir_list,CRLF,files
If>files_count>0
Let>f=0
Repeat>f
  Let>f=f+1
  Let>this_file=files_%f%
  //do something with this_file
Until>f=files_count

It’s easy to forget the power of DOS – it can save you a fair few lines of code sometimes.

Remember that any DOS command can be “piped” to an output file by appending >filename to the command line. This is very useful as we can then use ReadFile to get the output from the command, as we have done above.

August 11, 2009

Find and Click on Screen Text with MODI

Filed under: Automation,Scripting — Marcus Tettmar @ 8:21 am

Gale Dyvig has just posted some code in the forums showing how to use the Microsoft Office Document Imaging OCR component to locate text on the screen, find its position and click on it. Very useful. You’ll find it here.

MODI is a component that comes with Microsoft Office. You need to ensure Microsoft Office Document Imaging and Office Shared Features / Proofing Tools is installed in Office Tools.

Thanks Gale!

July 2, 2009

Selecting from a Drop Down List

Filed under: Automation,Scripting — Marcus Tettmar @ 1:12 pm

A project I’m currently working on involves populating data into a rather poorly designed user interface.  A particular challenge were the drop down lists and list boxes due to the fact that you can’t “drill down” by sending the text of the item you want selected.

The solution we settled upon uses the text capture function GetTextAtPoint in a loop which arrows down the entire list, with GetTextAtPoint capturing the text in the selection box to see if it’s the value required.  If it is we stop, else we continue to press down and loop.  We stop if the item selected is the same as the previous item, signifying that we’ve reached the end (obviously this assumes the list doesn’t have duplicates appearing together).

Here’s the subroutine:

SRT>dropDownSelect
/*****
Usage: GoSub>dropDownSelect,X,Y,text_to_select

Where X and Y are coordinates in the edit box (where the item appears once selected)
 and text_to_select is the text you want selected  

Important: the drop down needs to have the focus
*****/
  If>dropDownSelect_Var_3<>
    GetTextReset
    Press Home
    Wait>0.1
    Let>prev_comboText=Z@$%#XXX
    Let>dropDownFound=FALSE
    Label>select_drop
    GetTextAtPoint>dropDownSelect_Var_1,dropDownSelect_Var_2,comboText,c
    Pos>dropDownSelect_Var_3,comboText,1,p
    If>p<>1
      If>prev_comboText<>comboText
        Press Down
        Wait>0.2
        Let>prev_comboText=comboText
        Goto>select_drop
      Endif
    Else
      Let>dropDownFound=TRUE
    Endif
  Endif
END>dropDownSelect

So, for example, you could call it something like:

Press Tab
GoSub>dropDownSelect,240,355,Apples

Note that the drop down needs to be focused. Usually you’d be “tabbing” through fields on the form, so once you’ve used “Press Tab” to get focus onto the control you can call dropDownSelect.

This version requires that you pass the X,Y screen position of the text box area of the drop down. In development at the moment we have a new function which returns the handle of the currently focused object, and this can then be used in GetWindowPos to get the X,Y position. So in future you will be able to determine the X,Y position dynamically.

Hopefully you won’t need this function as you can usually “drill down” on a list box or drop down list by sending the text of the item you want as mentioned here. But for those badly designed controls that won’t allow that this function can be a life saver.

June 4, 2009

Calling Macro Scheduler Scripts from VB/VBA

Filed under: General,Scripting — Marcus Tettmar @ 4:10 pm

I’m often asked how you can run Macro Scheduler macros from other programming languages, particularly VB.  

Macro Scheduler scripts can be run from the command line.  See the help file topic “Command Line Options”.  E.g.:

msched.exe “c:\someplace\mymacro.scp”

VB/VBA lets you execute external commands/applications via the Shell function:

Shell “””c:\program files\macro scheduler 11\msched.exe”” “”c:\scripts\example.scp”””, vbNormalFocus

The only problem with the Shell function is that it does not wait until what it is calling has finished running before continuing.  So it fires off the macro and the program continues.  In most cases you’d want to wait for the script to finish before you continue.  To do this you can use the following ShellAndWait function:

    Private Declare Sub Sleep Lib "kernel32" ( _
        ByVal dwMilliseconds As Long)
    Private Declare Function GetExitCodeProcess Lib "kernel32" ( _
        ByVal hProcess As Long, ByVal lpExitCode As Long) As Long
    Private Declare Function timeGetTime Lib "winmm.dll" () As Long
    Private Declare Function OpenProcess Lib "kernel32" ( _
        ByVal dwDesiredAccess As Long, ByVal bInheritHandle As Long, ByVal dwProcessId As Long) As Long
    Private Const STILL_ACTIVE = &H103
    Private Const PROCESS_QUERY_INFORMATION = &H400
    Private Declare Function CloseHandle Lib "kernel32" ( _
        ByVal hObject As Long) As Long

    Public Function ShellAndWait( _
     ByVal sShell As String, _
            Optional ByVal eWindowStyle As Integer = vbNormalFocus, _
            Optional ByRef sError As String = "", _
            Optional ByVal lTimeOut As Long = 2000000000 _
        ) As Boolean
        Dim hProcess As Long
        Dim lR As Long
        Dim lTimeStart As Long
        Dim bSuccess As Boolean

        On Error GoTo ShellAndWaitError

        ' This is v2 which is somewhat more reliable: 
        hProcess = OpenProcess(PROCESS_QUERY_INFORMATION, False, Shell(sShell, eWindowStyle))
        If (hProcess = 0) Then
            sError = "This program could not determine whether the process started." & _
                 "Please watch the program and check it completes."
            ' Only fail if there is an error - this can happen 
            ' when the program completes too quickly. 
        Else
            bSuccess = True
            lTimeStart = timeGetTime()
            Do
                ' Get the status of the process 
                GetExitCodeProcess(hProcess, lR)
                ' Sleep during wait to ensure the other process gets 
                ' processor slice: 
DoEvents:       Sleep(100)
                If (timeGetTime() - lTimeStart > lTimeOut) Then
                    ' Too long! 
                    sError = "The process has timed out."
                    lR = 0
                    bSuccess = False
                End If
            Loop While lR = STILL_ACTIVE
        End If
        ShellAndWait = bSuccess

        Exit Function

ShellAndWaitError:
        sError = Err.Description
        Exit Function
    End Function

So your code becomes:

ShellAndWait “””c:\program files\macro scheduler 11\msched.exe”” “”c:\scripts\example.scp”””, vbNormalFocus

If you have Macro Scheduler Pro you can compile the script to an EXE and then just execute the exe, making the command line simpler:

ShellAndWait “c:\someplace\mymacro.exe”, vbNormalFocus

The above code is based on the code found here.

« Newer PostsOlder Posts »