    Data Capture vs OCR Software



      I’m appealing to all my techie contacts for help with finding the right software to solve a problem.  I don’t know if one software product can accomplish the two tasks we need done, but I think I’m looking for either an OCR product or a Data Capture product or both.

      The first task we need to address is taking existing PDFs and getting their data match imported into our FileMaker Pro database.  I’ve experimented with OCR software (ABBYY FineReader Express for Mac) to get the PDFs converted to Excel spreadsheets.  The problem is, while it works fine with PDFs already in a table or spreadsheet type format, it doesn’t work well with PDFs in other formats.  For example, we have tradeshow attendance directories that are laid out in non-uniform columns.  Using the OCR software leaves the city, state, and zip all in one cell as well as the first name, last name, and title in one cell.  This won’t work for database importing (unless there is a way around that).

      The second task we have is similar, but I don’t think OCR software will work.  We have access to web-based membership lists we also want to match import to our database.  They can’t be downloaded and printing as PDF leads to the same problem as above.  I’m not experienced with data capture software but from what I’ve read it seems like it could work.  I just can’t find one for Mac with a demo that I can test.  ListGrabber (http://www.egrabber.com/listgrabberstandard/) looks like the right type of product, but they don’t have a Mac version.

      I would greatly appreciate any insight or suggestions!

      Many Thanks!

        • 1. Re: Data Capture vs OCR Software

               If you have the actual PDF document--not just hard copy, you can extract text from the PDF without using OCR. This should produce much more error free data than optical character recognition.

               As an example you can copy and paste text from a PDF into another document--such as MS Word.

               I wouldn't be suprised find if some enterprising programmer hasn't come up with a widgit the allows you to extract data into an excel or text file.

               But Document format will always be an issue. It is possible to import data and then use a script to "clean up" the imported data, but the odds of that succeedding for you depend on how consistent a document format you get from one file to another.

          • 2. Re: Data Capture vs OCR Software

                 I would also be interested in doing something similar. The best way to capture text for OCR processing is to use a scanner. It would be great though if there was a plugin that would process container fields' photos of text with Abbyy text grabber in the way that Go is integrated with CNS barcode reader. But the camera on the iPhone isn't up to the job. However there are attachable lenses that allow macro (micro?) photos to be taken. This could be a workaround.

                 Does anyone know of a plugin that could batch OCR process these photos after they have been put in the container fields?

            • 3. Re: Data Capture vs OCR Software
                   I just came across OCR technology: "Automated document recognition and processing of forms, using optical character recognition - OCR to read text in scanned documents, and optical mark recognition (OMR) to read checkboxes, or bubble sheets for automated testing and grading". Some basic features it can achieve:
                   * OCR text from scanned, image-based document files
                   * Search or extract text characters from scanned PDF or other document files, even create searchable PDF file
                   * Modify text and images within the scanned document files