Work Opportunity

Technical support and scripting issues

Moderators: Dorian (MJT support), JRL

Post Reply
Paul51
Newbie
Posts: 3
Joined: Tue Dec 15, 2020 8:03 pm

Work Opportunity

Post by Paul51 » Thu Dec 17, 2020 2:48 pm

One of my sons has a need to process websites like these:
(an easier one) https://partners.sophos.com/english/dir ... ?l=Germany
(a harder one) https://www.cloudtango.org/

to pickout company name, address, phone number etc and put into a spreadsheet.

To date I've been doing this using MS to copy each page into a text file and C# to parse all the info dumped into the text file.
It doesn't have to be done like that; what's needed is website => spreadsheet.
The difficulty being that all company records on a website are not always totally consistent (eg a phone number might be missing and hence mess up the regularity) & typically more time is spent checking/accommodating the anomalies than doing the 98% of good records.

I need to move onto other areas in his business so I said I would post on the forum here to see if anyone would be interested in picking up this work. The projects generally arrive in burst mode, but on average I would say 1 project every 2 or 3 weeks. And depending on the difficulty, 1 to 3 days for a project.

Preferred would be someone who could do the whole website-to-spreadsheet route, but failing that, web-site-to-textfile would be a step in the right direction.

Paul

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 1348
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: Work Opportunity

Post by Dorian (MJT support) » Thu Dec 17, 2020 3:51 pm

I just replied to your support request regarding this.
Yes, we have a Custom Scripting Service. Message me or go here

Paul51
Newbie
Posts: 3
Joined: Tue Dec 15, 2020 8:03 pm

Re: Work Opportunity

Post by Paul51 » Thu Dec 17, 2020 4:01 pm

Thanks Dorian. I've replied to your email.
Paul

hagchr
Automation Wizard
Posts: 327
Joined: Mon Jul 05, 2010 7:53 am
Location: Stockholm, Sweden

Re: Work Opportunity

Post by hagchr » Sun Dec 27, 2020 3:55 pm

I think this has been sorted already, but it was a good training exercise so I did some work on getting data from the simple web site. I post the script in case anybody is interested. The result can then be copied from the message box and pasted into excel.

Code: Select all

// Needs to be adjusted with location of chromedriver.exe
Let>CHROMEDRIVER_EXE=C:\Users\Christer\Desktop\ChromeFile\chromedriver.exe

// Start session
ChromeStart>session_id

// Navigate to site
Let>URL1=https://partners.sophos.com/english/directory/search?l=Germany
ChromeNavigate>session_id,url,URL1

// Get source data
ChromeGetInfo>session_id,source,strResult

// Get all records
Let>tmp0=(?ms)plSearch\.allResults = \[\K.+(?=\}\];\RplSearch.pagination =)
RegEx>tmp0,strResult,0,m,nm,0

//Remove garbage
Let>tmp0=\{"Name":|"MailingStreet":|"MailingCityStatePostalCode":|"MailingCountry":|"Phone":|"Website":|"ViewPartnerUrl":".+?",|"TierLogoUrl":|"TierLogoName":|"/images/icons.+?",
RegEx>tmp0,m_1,0,m2,nm2,1,,strResult

// Create one company per line
Let>tmp0=\},
RegEx>tmp0,strResult,0,m3,nm3,1,CRLF,strResult

// Adj null -> empty
Let>tmp0=null
RegEx>tmp0,strResult,0,m4,nm4,1,,strResult

// Add initial/ending space to phone numbers to avoid excel formatting
Let>tmp0=(?m-s)(("[^"]+",){4})\K(?P<Phone>"[^"]+")
RegEx>tmp0,strResult,0,m5,nm5,1, $<Phone> ,strResult

// Adj \u0026 -> &
StringReplace>strResult,\u0026,&,strResult

// Adj \u0027 -> @
StringReplace>strResult,\u0027,@,strResult

// Close session
ChromeQuit>session_id

MDL>strResult

Paul51
Newbie
Posts: 3
Joined: Tue Dec 15, 2020 8:03 pm

Re: Work Opportunity

Post by Paul51 » Tue Dec 29, 2020 5:19 pm

There are 3 or 4 current projects that Dorian is doing. Would you be interested in any future ones? As I say, they are likely to be intermittent.
Also I notice you're in Sweden. George will likely have a need for a small amount of Swedish translation (of Job Titles) - would you be interested in that? Paul

Post Reply
cron
Sign up to our newsletter for free automation tips, tricks & discounts