13 Replies Latest reply on Dec 14, 2015 2:36 PM by mardikennedy

    Import data from PDF

    CarlSchwarz

      Hi all,

       

      I want to import text from PDF documents.  The PDF documents aren't images they have text in them, and they are fairly standard.

      This is the first time I am attempting to do this in Filemaker.

       

      Could others please let me know what methods they have used?  Plugin? Online service?  And also let me know which one you used and some feedback.

      Basically I am after recommendations.

        • 1. Re: Import data from PDF
          mardikennedy

          It probably depends a lot on numbers and environment.

           

          We had a situation where we had a (pdf) pile of scripts (TV).  It made good sense to open each one, Select All, Copy, click into a text field in the FMP database and Paste.  Absurdly simple but effective.

           

          Reasons:  we actually had an interest in looking at each record/ script.  There weren't a massive number of records, but each script might contain, say, 20 pages.  Also, the OCR results (from the original paper script to the pdf) were variable so this gave us an opportunity to assess the output.

           

          We did also have a container field for the pdf source.  OCR'd text in the pdf was as searchable as the text in the 'dump field', but the dump field allowed for correction, time permitting.

          • 2. Re: Import data from PDF
            Markus Schneider

            you might check the MBS Plugin. For OSX, there are some standard functions, for Windows (and OSX), there is an integration with 'dyna pdf'

            • 3. Re: Import data from PDF
              jbante

              Some users have had some success using the base 64 functions to pull any text data in a PDF. Base64Decode ( Base64Encode ( Table::pdfContainerField ) ) (with no file name specified) can pull out some text data.

              • 4. Re: Import data from PDF
                user19752

                Interesting technique, but it only get binary data as text as is, so can be applied to very limited (un optimized) PDFs. Many of PDFs have compressed text, they are need to decoded (expanded).

                • 5. Re: Import data from PDF
                  TorstenBernhard

                  I am using the MBS plugin for pulling data from PDF. Other plugins offer similar functionality but I have no experience to share for those. Many plugins can be tested for free. Have a try :-)

                  • 6. Re: Import data from PDF
                    jbante

                    Compressed text could be decoded, too, though it may be easier to do that with the un-decoded base 64 than the text. Though in that case it may be less work to use another method until someone else bites that bullet first.

                     

                    Carl hasn't said where the PDF is coming from. If the PDFs in question are currently using compressed text, but Carl has the necessary control over their generation, he might declare that the PDFs should use only uncompressed text for ease of decoding. On the other hand, if Carl had that control over the PDFs, I imagine he wouldn't need FileMaker to pull the text contents back out of them again.

                    • 7. Re: Import data from PDF
                      jrenfrew

                      360Works Scribe

                       

                      or ScriptMaster and roll your own function ( with iText or IcePDF)

                      • 8. Re: Import data from PDF
                        CarlSchwarz

                        I don't have control over the source but they are simple PDF's.  I'm going to examine this route, thanks!

                        • 9. Re: Import data from PDF
                          CarlSchwarz

                          I actually really like this idea!  If my client needs to do a visual check any this may work well.  I'm going to test this workflow.

                          • 10. Re: Import data from PDF
                            Fulvio Di Rosa

                            Hi, i'm new in FileMaker and have same problem but  i've solved with 360Works Scribe.

                            This is script i'used under Button to try Plug-In in Italian Language:

                            Imposta variabile [$w ; Valore: ScribeDocLoad( "/Users/iMac27/Desktop/Example.pdf" ) ]

                            Imposta variabile [$w ; Valore: ScribeDocWriteValue( "x_Name" ; Customers::Name) ]

                            Imposta variabile [$w ; Valore: ScribeDocWriteValue( "x_Surname" ; Customers::Surname ) ]

                             

                            ........ Others Fields Merge from "Example" PDF fields and "Customers" Table  .........

                             

                            Imposta variabile [$w ; Valore: ScribeDocSaveFile( "/Users/iMac27/Desktop/Example_new.pdf" ; "flatten=true" ) ]

                            Hope to help you. Best Regards

                            • 11. Re: Import data from PDF
                              TKnTexas

                              For my need, Adobe's online conversion to Excel worked.  Then I imported the Excel worksheet.

                              • 12. Re: Import data from PDF
                                CarlSchwarz

                                Just a heads up, copy pasting from a PDF is not a good idea for multi platform deployments.  The text comes out ordered differently depending on the application used to copy / paste

                                • 13. Re: Import data from PDF
                                  mardikennedy

                                  Huh!  Good to know and sorry to hear that.  In our case, it was a pure Windows 7 environment. (FMS 12, FMP 12)

                                   

                                  Workflow: scan and ocr the text, insert it into the FMP container field, select all, copy, click into the text field, paste.  Note: we only used FMP for the copy/ paste, not an additional app.