Hello,
Is it possible to read PDF file contents including table, text using MS v14. If yes, appreciate if someone can provide a link or high level steps.
Thanks!
Reading PDF file table and text
Moderators: JRL, Dorian (MJT support)
- Dorian (MJT support)
- Automation Wizard
- Posts: 1415
- Joined: Sun Nov 03, 2002 3:19 am
Re: Reading PDF file table and text
Yes, for this I use a combination of Macro Scheduler and PDFtoText. (download link for xpdf command line tools).
You may have to learn more about PDFtoText usage in order to specifically extract tables, but this example should get you started with the Macro Scheduler part.
It might be worth experimenting with -table and -layout to see what gives you best result. I just tried using each one individually and both combined, and for the PDF I tested on I got better results with just -table.
You may have to learn more about PDFtoText usage in order to specifically extract tables, but this example should get you started with the Macro Scheduler part.
Code: Select all
//Set the file paths
Let>PDFFile=C:\Users\xb360\Downloads\Invoices.pdf
Let>OutputFile=%script_dir%\Output3.txt
//Delete Previous Output File
DeleteFile>OutputFile
//The path to PDFtoText
Let>FilePath=d:\MJT\xpdf-tools-win-4.00\bin64
//Run PDFtoText using the variables I set above
RunProgram>%FilePath%\pdftotext -table -layout "%PDFFile%" "%OutputFile%"
//Wait for PDFtoText to finish
Wait>2
//Execute the txt file
ExecuteFile>OutputFile
Code: Select all
//Set the file paths
Let>PDFFile=C:\Users\xb360\Downloads\Invoices.pdf
Let>OutputFile1=%script_dir%\Output3 table.txt
Let>OutputFile2=%script_dir%\Output3 layout.txt
Let>OutputFile3=%script_dir%\Output3 both.txt
//Delete Previous Output
DeleteFile>OutputFile1
DeleteFile>OutputFile2
DeleteFile>OutputFile3
//The path to PDFtoText
Let>FilePath=d:\MJT\xpdf-tools-win-4.00\bin64
//Run PDFtoText using the variables I set above
RunProgram>%FilePath%\pdftotext -table "%PDFFile%" "%OutputFile1%"
RunProgram>%FilePath%\pdftotext -layout "%PDFFile%" "%OutputFile2%"
RunProgram>%FilePath%\pdftotext -table -layout "%PDFFile%" "%OutputFile3%"
//Wait for PDFtoText to do it's thing
Wait>2
//Execute the txt file
ExecuteFile>OutputFile1
ExecuteFile>OutputFile2
ExecuteFile>OutputFile3