PDF Content

Technical support and scripting issues

Moderators: Dorian (MJT support), JRL

Post Reply
[email protected]
Junior Coder
Posts: 25
Joined: Wed Jul 20, 2011 3:07 pm

PDF Content

Post by [email protected] » Wed Jul 09, 2014 4:50 pm

I need to find some content in a PDF doc. Anyone have any ideas?

[email protected]
Junior Coder
Posts: 25
Joined: Wed Jul 20, 2011 3:07 pm

Re: PDF Content

Post by [email protected] » Wed Jul 09, 2014 5:31 pm

[email protected] wrote:I need to find some content in a PDF doc. Anyone have any ideas?
I found a command line utility at https://pdfbox.apache.org/commandline/#extractText that will extract the text into a text file which I then search for the content I'm looking for. This will work for my needs, is probably not perfect for needs.

User avatar
Marcus Tettmar
Site Admin
Posts: 7380
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Re: PDF Content

Post by Marcus Tettmar » Thu Jul 10, 2014 6:12 am

If the PDF contains text that text can be extracted and there are a number of free/open source tools you can use. However, some PDFs are all images, e.g. Documents that have been scanned ... so won't have any extractable text.


Sent from my iPad using Tapatalk
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

User avatar
Djek
Pro Scripter
Posts: 147
Joined: Sat Feb 05, 2005 11:35 pm
Location: Holland
Contact:

Re: PDF Content

Post by Djek » Thu Jul 10, 2014 7:59 am

hi,

you can first open de pdf-file with the Adobe reader,
then use the "save as" txt to a file,
then use msched to find your string in this file, you can find all the text.

kind regards,
Djek

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts