Getting Data From Dynamic Web Pages

Published on February 14, 2006 by Marcus Tettmar in Scripting

I’ve just posted an example in response to a message on the forum where someone needed to extract data from finance.yahoo.com. The easiest way to create code to extract data from a page is to use WebRecorder‘s Tag Extractor. Unfortunately, finance.yahoo.com shows random adverts above the data and the HTML used to display the adverts is not always the same. What’s more the adverts are not always shown. This results in the HTML of the page being dynamic and the number of table elements can be different each time the page is loaded. Therefore the code that gets data from a specific table element may not always retrieve the correct data.

My proposed solution is to have the script search for the start of the data table. If you go to http://finance.yahoo.com/q/ae?s=UPS you will see that the first element in the table has the text “Earnings Est”. So the script uses the ExtractTag function in a loop. The loop starts with tag TD, index 30. If the content of that element is “Earnings Est” it stops, if not it loops back, incrementing the index. So on the next iteration it looks at TD31 and so on. When the value “Earnings Est” is found it jumps out of the loop and we now have the index of the first TD tag in the data table.

Now all we need to do to extract the data we want is to apply an offset to the located index. E.g. the value for Avg. Estimate, Current Qtr is in the 6th cell after the starting TD. So we add 6 to the index and use that with ExtractTag to get the data. And so on.

For the code and my explanation see the forum topic How do I pick numbers off a webpage?

« WebRecorder 1.67 | Seventeen Minutes with Bill »