    Import data from PDF


      Hi all,


      I want to import text from PDF documents.  The PDF documents aren't images they have text in them, and they are fairly standard.

      This is the first time I am attempting to do this in Filemaker.


      Could others please let me know what methods they have used?  Plugin? Online service?  And also let me know which one you used and some feedback.

      Basically I am after recommendations.

          It probably depends a lot on numbers and environment.


          We had a situation where we had a (pdf) pile of scripts (TV).  It made good sense to open each one, Select All, Copy, click into a text field in the FMP database and Paste.  Absurdly simple but effective.


          Reasons:  we actually had an interest in looking at each record/ script.  There weren't a massive number of records, but each script might contain, say, 20 pages.  Also, the OCR results (from the original paper script to the pdf) were variable so this gave us an opportunity to assess the output.


          We did also have a container field for the pdf source.  OCR'd text in the pdf was as searchable as the text in the 'dump field', but the dump field allowed for correction, time permitting.

            Markus Schneider

            you might check the MBS Plugin. For OSX, there are some standard functions, for Windows (and OSX), there is an integration with 'dyna pdf'

              Some users have had some success using the base 64 functions to pull any text data in a PDF. Base64Decode ( Base64Encode ( Table::pdfContainerField ) ) (with no file name specified) can pull out some text data.

                Interesting technique, but it only get binary data as text as is, so can be applied to very limited (un optimized) PDFs. Many of PDFs have compressed text, they are need to decoded (expanded).

                  I am using the MBS plugin for pulling data from PDF. Other plugins offer similar functionality but I have no experience to share for those. Many plugins can be tested for free. Have a try :-)

                    Compressed text could be decoded, too, though it may be easier to do that with the un-decoded base 64 than the text. Though in that case it may be less work to use another method until someone else bites that bullet first.


                    Carl hasn't said where the PDF is coming from. If the PDFs in question are currently using compressed text, but Carl has the necessary control over their generation, he might declare that the PDFs should use only uncompressed text for ease of decoding. On the other hand, if Carl had that control over the PDFs, I imagine he wouldn't need FileMaker to pull the text contents back out of them again.

                      360Works Scribe


                      or ScriptMaster and roll your own function ( with iText or IcePDF)

                        I don't have control over the source but they are simple PDF's.  I'm going to examine this route, thanks!

                          I actually really like this idea!  If my client needs to do a visual check any this may work well.  I'm going to test this workflow.

                            Fulvio Di Rosa

                            Hi, i'm new in FileMaker and have same problem but  i've solved with 360Works Scribe.

                            This is script i'used under Button to try Plug-In in Italian Language:

                            Imposta variabile [$w ; Valore: ScribeDocLoad( "/Users/iMac27/Desktop/Example.pdf" ) ]

                            Imposta variabile [$w ; Valore: ScribeDocWriteValue( "x_Name" ; Customers::Name) ]

                            Imposta variabile [$w ; Valore: ScribeDocWriteValue( "x_Surname" ; Customers::Surname ) ]


                            ........ Others Fields Merge from "Example" PDF fields and "Customers" Table  .........


                            Imposta variabile [$w ; Valore: ScribeDocSaveFile( "/Users/iMac27/Desktop/Example_new.pdf" ; "flatten=true" ) ]

                            Hope to help you. Best Regards

                              For my need, Adobe's online conversion to Excel worked.  Then I imported the Excel worksheet.

                                Just a heads up, copy pasting from a PDF is not a good idea for multi platform deployments.  The text comes out ordered differently depending on the application used to copy / paste

                                  Huh!  Good to know and sorry to hear that.  In our case, it was a pure Windows 7 environment. (FMS 12, FMP 12)


                                  Workflow: scan and ocr the text, insert it into the FMP container field, select all, copy, click into the text field, paste.  Note: we only used FMP for the copy/ paste, not an additional app.