Automated OCR Validated for Use in Large Dutch Vaccine Study
When it comes to entering data about thousands of patients at various
stages in a very large medical vaccine trial, it pays to think small in terms
of how many times human hands should intervene.
That's sound advice from Keith Passaur, president and owner of eDocfile,
Inc. (Valrico, Fla.), who recently delivered Optical Character Recognition
(OCR) software to organize and automate filing of up to 4,000 documents
daily for the world's second-largest medical vaccine trial.
Happening right now in the Netherlands, the trial involves 85,000 patients
from several remote health centers and 340,000 documents. File by OCR
(eDocfile) had been purchased by the hospital running the trial to help
manage and file documents. File by OCR extracts text from pdf or tiff files.
The extracted text is parsed and used to rename and relocate the file to
build a file folder hierarchy.
Passaur was tasked with modifying the OCR program so the study files
could be named based on index information (patient number, center
number, and document type). He received a six-page tri-fold form that
would be filled out by each patient. On the third page of the form, a vertical
number readable with OCR provided the patient number and center
number where the form was generated.
Completed forms would be scanned at remote centers in a duplex manner,
creating a two-page tiff file for sending to the hospital. There, the scanned
image would be separated into six individual pages and the vertical
number extracted for filing purposes. The file would then be re-assembled
into a five-page pdf (page 6 was blank) and filed based on the OCR
contents. All processing of documents would be done in a batch process
after scanning, freeing users to move on to other tasks while the OCR
process was underway.
Writing Scripts to Save Time
Since each document would be processed in the same way, Passaur
automated the steps to initiate the file command lines with Macro
Scheduler, a Windows script-writing tool from MJT Net (Shaftesbury, UK).
Says Passaur: "Macro Scheduler allows us to automate very complicated
functions, such as parsing out OCR text content and batch renaming of
files." eDocfile has used Macro Scheduler for about seven years to
automate repetitive steps in Windows-based software the company
develops for its clients.
Once the steps are put together in the script, they can easily be modified
for use in other programs to automate similar actions and reduce the
likelihood of error. "There was no reason a highly paid staff member should
have to manually perform these steps for every document coming in,"
Passaur adds.
Macro Scheduler was used to automate steps throughout the OCR
filing process, he notes.
Each center had been assigned a certain range of patient numbers, so to
check for missing or misfiled documents Passaur created a macro that
compared the list of assigned numbers for that center to patient numbers
to validate the OCR. Also, since all files must be accounted for, he wrote a
macro to extract the patient number, center number, and the trial stage
from an Excel spreadsheet for validation.
Because OCR is not 100% accurate, Passaur used Macro Scheduler to
write scripts that would test and retest the captured data for errors. The net
result is that 1 out of every 1,000 documents has to be manually filed.
The Macro Scheduler-modified program was installed in fall 2009, and the
complete process was validated by an external auditing company. The
hospital is processing 1,300 documents and more than 2,600 faxes daily,
with users manually processing the three or four fails each day with the
manual processing tools built into the software.
"With that volume, it pays to keep human hands where they belong, with
patients, and not keying in file codes," notes Passaur.
For more information on File by OCR, visit http://www.edocfile.com/. For
information on Macro Scheduler, visit http://www.mjtnet.com/