Chrome Capturing HTML / Javascript Page source Values

Franklin · Post by **Franklin** » Mon Jun 07, 2021 9:41 pm

Hello.

Looking for best approach to to capture the follow datapoints out of an online HTML file.

I tried setting up a Chrome session and looked for the datapoints via ChromeFindElements
but they all come back empty. or zero.

Any suggestions on an approach to capture data points which appear in the source as follows?

Code: Select all


<script type="text/javascript">
  infoDataLayer = [
  {
  "pageType" : "product",
  "topLevelCategory" : "Computer",
  "productCategory" : "Computer Hardware New",
  "isbn" : "9781484267000",
  "productId" : "9781484267000",
  "pPriceGross" : "99.99",
  "ePriceGross" : "89.99",
  "eIsbn" : "9781484267000",
  "pIsbn" : "9781484267000",
  "fn" : "Example Title",
  "description" : "1 descript 2 go",
  "currency" : "UK",
  "url" : "https://someurl.com/9781484267000",
  "photo" : "https://someurl.com/pic/images/9781484267000.jpg",
  "ecommerce" : {
    "currencyCode" : "UK",
    "detail" : {
      "products" : [ {
        "name" : "1 descript 2 go",
        "id" : "978-1-4842-6700-0",
        "price" : "99.99",
        "brand" : "XYZ",
        "category" : "Computer Hardware",
        "variant" : "ibo",
        "dimension21" : "ABC",
        "dimension22" : "-92"
      }, {
        "name" : "1 descript 2 go",
        "id" : "978-1-4842-6700-0",
        "price" : "99.99",
        "brand" : "XYZ",
        "category" : "Computer Hardware",
        "variant" : "hardcopy",
        "dimension21" : "ABC",
        "dimension22" : "-227"
      } ]
    }
  },
  "content" : {
    "authorization" : {
      "status" : false
    }
  }
}
  ];

  publicDataLayer = publicDataLayer || [];
  combineDataLayer = sprMerge(publicDataLayer, infoDataLayer);
</script>

Franklin · Post by **Franklin** » Thu Jun 10, 2021 10:39 am

Sharing the discovery of one workable solution using pyexe features
This was tricky because the normal elements on the page did not visually display the hidden data.
It was found in the HTML source code by parsing each javascript/text section.

Using pyexe features of Macro Scheduler, this overwhelming task became approachable. (*Tipping my hat to Marcus and Team* )

The code snippet below is the concept I used to approach this problem.

Code: Select all

Let>PYTHON_DLL=python37.dll
Let>base_url=https://someurl.com/ebook-details

/* 
python_code:

import requests
import time
import csv
import re
from bs4 import BeautifulSoup
import json

  #function to write to csv
def write_to_csv(list_input):
    # The scraped info will be written to a CSV here.
    try:
        with open("data.csv", "a") as fopen:  # Open the csv file.
            csv_writer = csv.writer(fopen)
            csv_writer.writerow(list_input)
    except:
        return False

r = requests.get(%base_url%)

soup = BeautifulSoup(r.text, 'html.parser')

#find every occurance of javascript/text in HTML
all_scripts = soup.find_all('script',type='text/javascript')

# print(all_scripts[2]) - uncomment to see output

# output / CSV formatting code [ goes here]
...
...

# write to CSV the 3rd occurance of javascript/text - starts w/ zero
write_to_csv(all_scripts[2])

bookdata = all_scripts[2]

# Anything print to IO is returned in the PYExec output var
print("All Done")

*/

//Load the Python code to a variable

LabelToVar>python_code,pcode

//Run the Python code

PYExec>pcode,output, bookdata
MessageModal>Summary of eBook Cataloging Data is: %bookdata%

..continue scp

Chrome Capturing HTML / Javascript Page source Values

Chrome Capturing HTML / Javascript Page source Values

Re: Chrome Capturing HTML / Javascript Page source Values