If you have the actual PDF document--not just hard copy, you can extract text from the PDF without using OCR. This should produce much more error free data than optical character recognition.
As an example you can copy and paste text from a PDF into another document--such as MS Word.
I wouldn't be suprised find if some enterprising programmer hasn't come up with a widgit the allows you to extract data into an excel or text file.
But Document format will always be an issue. It is possible to import data and then use a script to "clean up" the imported data, but the odds of that succeedding for you depend on how consistent a document format you get from one file to another.
I would also be interested in doing something similar. The best way to capture text for OCR processing is to use a scanner. It would be great though if there was a plugin that would process container fields' photos of text with Abbyy text grabber in the way that Go is integrated with CNS barcode reader. But the camera on the iPhone isn't up to the job. However there are attachable lenses that allow macro (micro?) photos to be taken. This could be a workaround.
Does anyone know of a plugin that could batch OCR process these photos after they have been put in the container fields?
I just came across OCR technology: "Automated document recognition and processing of forms, using optical character recognition - OCR to read text in scanned documents, and optical mark recognition (OMR) to read checkboxes, or bubble sheets for automated testing and grading". Some basic features it can achieve:* OCR text from scanned, image-based document files* Search or extract text characters from scanned PDF or other document files, even create searchable PDF file* Modify text and images within the scanned document files