Having Trouble Extracting from a site using Chrome Functions...

Technical support and scripting issues

Moderators: Dorian (MJT support), JRL

Post Reply
rjw524
Pro Scripter
Posts: 104
Joined: Wed May 09, 2012 9:45 pm
Location: Michigan

Having Trouble Extracting from a site using Chrome Functions...

Post by rjw524 » Mon Jul 31, 2023 7:46 pm

Hi,

I'm trying to use Chrome Functions to scrape data from Indeed. Here's a link to the page I'm trying to scrape:

https://www.indeed.com/jobs?q=Accountan ... d5dd495be3

I just need to extract the following data:

1) Job Title
2) Company Name
3) City & State

I tried a basic script to just to see if it would pull Element Data and it is pulling nothing. I admit that I am very new to using the Chrome Functions.

Code: Select all

//Set IGNORESPACES to 1 to force script interpreter to ignore spaces.
//If using IGNORESPACES quote strings in {" ... "}
//Let>IGNORESPACES=1

Let>CHROMEDRIVER_EXE=c:\chromedriver.exe
Let>CHROMEDRIVER_OPTIONS=--start-maximized
 
//start a Chrome session
ChromeStart>session_id
 
//navigate to google.com
ChromeNavigate>session_id,url,https://www.indeed.com/jobs?q=Accountant&l=Detroit&from=searchOnHP&vjk=a9a229d5dd495be3/

wait 10.0

ChromeFindElements>session_id,xpath,main,elements
ChromeGetElementData>session_id,elements_1,text,theText
ChromeGetElementData>session_id,elements_1,attribute/innerHTML,theHtml

Wait 1.0

MDL>%theText%
Any help on how to write this would be greatly appreciated.

Thanks,

rjw524

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 1354
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: Having Trouble Extracting from a site using Chrome Functions...

Post by Dorian (MJT support) » Mon Jul 31, 2023 9:05 pm

This post may help you with the correct xpath usage.
Yes, we have a Custom Scripting Service. Message me or go here

rjw524
Pro Scripter
Posts: 104
Joined: Wed May 09, 2012 9:45 pm
Location: Michigan

Re: Having Trouble Extracting from a site using Chrome Functions...

Post by rjw524 » Mon Jul 31, 2023 10:49 pm

Dorian (MJT support) wrote:
Mon Jul 31, 2023 9:05 pm
This post may help you with the correct xpath usage.
Hi Dorian,

I tried following along with the post you sent but I had a lot of trouble following the code and what was taking place. I usually am able to follow along to a degree but this is a bit beyond my skills at this point.

Everything I need to extract appears to be in these types of blocks but I just can't understand the code to pull it out:

Code: Select all

<span title="Staff Accountant" id="jobTitle-a7d721ba6a2d3e4c">Staff Accountant</span></a></h2></div><div class="heading6 company_location tapItem-gutter companyInfo"><span class="companyName">Seligman Group</span><div class="companyLocation">Southfield, MI 48076</div></div><div class="heading6 tapItem-gutter metadataContainer metadataContainer-wrap noJEMChips"><div class="metadata salary-snippet-container"><div class="attribute_snippet" data-testid="attribute_snippet_testid">$65,000 - $70,000 a year
If I want to pull out the Title ("Staff Accountant"), Company Name ("Seligman Group") and Location ("Southfield, MI") for all of the different results how would I code that using the xpath example you demonstrated?

Thanks again,

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 1354
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: Having Trouble Extracting from a site using Chrome Functions...

Post by Dorian (MJT support) » Tue Aug 01, 2023 9:36 am

The trick was to get the entire job panel. There are 15 of them so you should see 15 results.

From this :
<td class="resultContent"><div class="css-1m4cuuf e37uo190"><h2 class="jobTitle css-1h4a4n5 eu4oa1w0" tabindex="-1">

We get:
ChromeFindElements>session_id,xpath,//td[@class='resultContent'],el

This should be enough to get you started, but you may need to fine tune a little.

Code: Select all

ChromeFindElements>session_id,xpath,//td[@class='resultContent'],el
MDL>%el_count% found

let>extractloop=0
repeat>extractloop
  Let>extractloop=extractloop+1
  ChromeGetElementData>session_id,el_%extractloop%,text,details
  mdl>details
  separate>details,LF,detail
  
  Let>detailloop=0
  repeat>detailloop
  Let>detailloop=detailloop+1
    mdl>detail_%detailloop%
  Until>detailloop,detail_count
Until>extractloop,el_count

Yes, we have a Custom Scripting Service. Message me or go here

rjw524
Pro Scripter
Posts: 104
Joined: Wed May 09, 2012 9:45 pm
Location: Michigan

Re: Having Trouble Extracting from a site using Chrome Functions...

Post by rjw524 » Tue Aug 01, 2023 4:22 pm

Dorian (MJT support) wrote:
Tue Aug 01, 2023 9:36 am
The trick was to get the entire job panel. There are 15 of them so you should see 15 results.

From this :
<td class="resultContent"><div class="css-1m4cuuf e37uo190"><h2 class="jobTitle css-1h4a4n5 eu4oa1w0" tabindex="-1">

We get:
ChromeFindElements>session_id,xpath,//td[@class='resultContent'],el

This should be enough to get you started, but you may need to fine tune a little...
[/code]
Thanks Dorian!

This was VERY helpful. I am looking into right now to understand the code.

So, did you go into the "View Source Page" to determine what to extract?

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 1354
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: Having Trouble Extracting from a site using Chrome Functions...

Post by Dorian (MJT support) » Tue Aug 01, 2023 4:44 pm

Hover over the element you want, right-click n (inspect), then do it a second time as it often does not select the correct part of the code. Once Developer Tools opens up you can also press CTRL-SHIFT-C - then see what happens when you hover of any page elements.

Just as food for thought, below is the Job title... we could have used any of the code snippets below, but I didn't want to get the elements separately, as who's to say "job title 5" would always match "location 5".

<div class="css-1m4cuuf e37uo190"><h2 class="jobTitle css-1h4a4n5 eu4oa1w0" tabindex="-1"><a id="job_f670767fb708a4f4" data-mobtk="1h6o32aqejga6801" data-jk="f670767fb708a4f4" data-hiring-event="false"

Code: Select all

ChromeFindElements>session_id,xpath,//div[@class='css-1m4cuuf e37uo190'],el
MDL>%el_count% found
ChromeGetElementData>session_id,el_1,text,details
mdl>details

ChromeFindElements>session_id,xpath,//h2[@class='jobTitle css-1h4a4n5 eu4oa1w0'],el
MDL>%el_count% found
ChromeGetElementData>session_id,el_1,text,details
mdl>details

ChromeFindElements>session_id,xpath,//h2[@tabindex='-1'],el
MDL>%el_count% found
ChromeGetElementData>session_id,el_1,text,details
mdl>details
Essentially it boils down to this, where AAA is always the html tag :

<AAA BBB="CCC">

ChromeFindElements>session_id,xpath,//AAA[@BBB='CCC'],el

Marcus wrote a very good blog post on it here : https://www.mjtnet.com/blog/2020/02/27/ ... functions/
Yes, we have a Custom Scripting Service. Message me or go here

rjw524
Pro Scripter
Posts: 104
Joined: Wed May 09, 2012 9:45 pm
Location: Michigan

Re: Having Trouble Extracting from a site using Chrome Functions...

Post by rjw524 » Tue Aug 01, 2023 7:03 pm

Dorian (MJT support) wrote:
Tue Aug 01, 2023 4:44 pm
Marcus wrote a very good blog post on it here : https://www.mjtnet.com/blog/2020/02/27/ ... functions/
Thanks, Dorian. This article is extremely helpful!

Post Reply
cron
Sign up to our newsletter for free automation tips, tricks & discounts