Web Page Scraping with the following...

Technical support and scripting issues

Moderators: JRL, Dorian (MJT support)

Post Reply
fthomas
Pro Scripter
Posts: 91
Joined: Fri Oct 03, 2008 6:40 pm

Web Page Scraping with the following...

Post by fthomas » Thu Nov 11, 2010 2:19 am

Hey all, I'm wondering the following. Is it possible to scrape a web page with the following situations in mind:

1. Scraping with not an ID or Name, but a Class name?
2. Scraping the data that exists between a set of delimiters, such as, .
3. With the above in mind with also the ability to perform this with wildcards or regex expressions?

Second, is there any way to do this and have it all inserted into an array set if more then one instance exists?

Thanks!

Frank

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Thu Nov 11, 2010 9:09 am

Yes. Use ExtractTag to extract the entire BODY element (or even HTML element) or if there is a more defined section containing the data like a DIV then use that. Tell it to extract all HTML. Then parse that HTML using RegEx.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

fthomas
Pro Scripter
Posts: 91
Joined: Fri Oct 03, 2008 6:40 pm

Post by fthomas » Thu Nov 11, 2010 8:19 pm

Thanks for the reply Marcus,

Can you offer up some examples or possibly would it be possible to hire an hour of your time to get some condensed learning?

Frank

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts