October 3, 2013

Automating Google

Filed under: Automation,Scripting — Marcus Tettmar @ 1:59 pm

This has come up a few times lately so I thought I’d post it here.

One or two people have been asking about writing a Macro Scheduler script that performs a Google search and pulls back the resulting URLs. They have discovered that trying to automate Google can be awkward because of the dynamic nature of the page. As soon as you type in the search box the page changes because it is updating in real time.

Some people suggested using HTTPRequest to get the page content directly and then parse it with RegEx. This can be done but is also tricky because you’re getting back all the dynamic code and other stuff like adwords ads etc and it is hard to find the right patterns.

There’s an easier way. Use their API URL instead. E.g.:

http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=Apple%20Pie&rsz=large&start=1

This performs a search for “Apple Pie”. Try it in your browser and you’ll see the content is less cluttered and it is easy to parse. It returns 8 results. For the next 8 you would change the start parameter to 9, and so on.

So here’s a simple example script which takes a search term, specifies how many results to get, extracts the URLs and writes them to a text file: