Reading PDF file table and text

Technical support and scripting issues

Moderators: Dorian (MJT support), JRL

Post Reply
Umya
Newbie
Posts: 1
Joined: Wed Sep 02, 2020 3:35 am

Reading PDF file table and text

Post by Umya » Wed Sep 02, 2020 3:39 am

Hello,

Is it possible to read PDF file contents including table, text using MS v14. If yes, appreciate if someone can provide a link or high level steps.

Thanks!

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 1350
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: Reading PDF file table and text

Post by Dorian (MJT support) » Wed Sep 02, 2020 9:40 am

Yes, for this I use a combination of Macro Scheduler and PDFtoText. (download link for xpdf command line tools).

You may have to learn more about PDFtoText usage in order to specifically extract tables, but this example should get you started with the Macro Scheduler part.

Code: Select all

//Set the file paths
Let>PDFFile=C:\Users\xb360\Downloads\Invoices.pdf
Let>OutputFile=%script_dir%\Output3.txt

//Delete Previous Output File
DeleteFile>OutputFile

//The path to PDFtoText
Let>FilePath=d:\MJT\xpdf-tools-win-4.00\bin64

//Run PDFtoText using the variables I set above
RunProgram>%FilePath%\pdftotext -table -layout "%PDFFile%" "%OutputFile%"

//Wait for PDFtoText to finish
Wait>2

//Execute the txt file
ExecuteFile>OutputFile
It might be worth experimenting with -table and -layout to see what gives you best result. I just tried using each one individually and both combined, and for the PDF I tested on I got better results with just -table.

Code: Select all

//Set the file paths
Let>PDFFile=C:\Users\xb360\Downloads\Invoices.pdf
Let>OutputFile1=%script_dir%\Output3 table.txt
Let>OutputFile2=%script_dir%\Output3 layout.txt
Let>OutputFile3=%script_dir%\Output3 both.txt

//Delete Previous Output
DeleteFile>OutputFile1
DeleteFile>OutputFile2
DeleteFile>OutputFile3

//The path to PDFtoText
Let>FilePath=d:\MJT\xpdf-tools-win-4.00\bin64

//Run PDFtoText using the variables I set above
RunProgram>%FilePath%\pdftotext -table "%PDFFile%" "%OutputFile1%"
RunProgram>%FilePath%\pdftotext -layout "%PDFFile%" "%OutputFile2%"
RunProgram>%FilePath%\pdftotext -table -layout "%PDFFile%" "%OutputFile3%"

//Wait for PDFtoText to do it's thing
Wait>2

//Execute the txt file
ExecuteFile>OutputFile1
ExecuteFile>OutputFile2
ExecuteFile>OutputFile3

Yes, we have a Custom Scripting Service. Message me or go here

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts