5 Replies Latest reply on Apr 25, 2009 8:11 AM by comment_1

    importing data from word documents



      importing data from word documents


      i have a database with fields such as name, record number, LA, Ao, PA, (the last 3 are number fields) etc.


      i have  many word documents, one per person.  each has these labels in them with data followng them (either one word or a number).  obviously the word documents are just a lot of free text, but the values are preceded with text labeling them  as above.


      is there anyway to import this data to fmp either directly of via some intermediary application?


      thanks.  and i know i just posted a question right below this one.  i promise these are my only two questions for now.

        • 1. Re: importing data from word documents

          GW wrote:


          ...  each has these labels in them with data followng them

          Yes, you can import to fmp the whole word document and than do a parsing.


          Can you write here one word document ? ( eventually with false name and data )

          • 2. Re: importing data from word documents



            here is a sample with fake data.


            "Subject: john doe             Date of Study: 1/1/2001
            MRN:000001                    Referring:smith
            Indication: sample              Disk: CDVD#111

            Measurements (normals)
            LVEF 55%(56-78) LVSV 55ml(33-97) LVEDV 100ml(52-141)
            LA 25mm(21-39) MPA 20mm(16-33) LPA 20mm(12-23) RPA 20mm(12-23)
            Ascending Aorta  22mm(22-38) Descending Aorta 22mm(14-26)
            IVS 5mm(6-12) LVEDD  55mm(37-54) LVLW 5mm(5-11)
            RVEF 55%(47-80) RVSV 55ml(35-98) RVEDV  100ml(58-154)
            Indexed Values (normals)
            BSA  2.0m2
            LVEDV/BSA 50ml/m2(41-81) RVEDV/BSA 50ml/m2 (48-87)


            all normal. etc



            all normal. etc


            1.  all normal"


            so the stuff from the top (name, and all the number fields) i want to import into respective fields.

            the stuff under strcuture, function, and impession will be imported into free text data fileds in fmp as is.



            • 3. Re: importing data from word documents

              I don't know of a way to import Word documents directly. However, if you can batch-convert them to plain text files with a .txt extension and place all of them in a single folder, you can then import them all at once (File > Import Records > Folder). There will be a record for each file, with all of the text in a single field.


              After that, you can begin parsing the data out to individual fields. Here, much depends on how consistent the format is (the example doesn't tell us that).

              • 4. Re: importing data from word documents

                so batch converting .doc to .txt.  is no problem

                and i understand how you suggest to import them.

                but when you say to parse the data out to individual fields, could you explain how to do that in an automated fashion


                i have about 6000 documents currently.  each looks exactly like the example i have. each for a separate person.


                again, thanks for the advise. 

                • 5. Re: importing data from word documents
                     Well, you have at least 3 different "models" here (it's hard for me to see what the actual characters are), so you may need a slightly different calculation for each.

                  The generic formula for extracting text between known prefix and suffix is:

                  Let ( [
                  start = Position ( text ; prefix ; 1 ; 1 ) + Length ( prefix ) ;
                  end = Position ( text ; suffix ; start ; 1 )
                  ] ;
                  Middle ( text ; start ; end - start )

                  So for example to extract the LVEDD value, you could use:

                  Let ( [
                  start = Position ( text ; "LVEDD" ; 1 ; 1 ) + 5 ;
                  end = Position ( text ; " " ; start ; 1 )
                  ] ;
                  Middle ( text ; start ; end - start )


                  This assumes that there is no space in the extracted value itself. You could also use " LVLW" as the suffix - provided that LVLW always follows LVEDD, with exactly one space separating between them.

                  I'd suggest you import the data into a temp file and start adding calculation fields there - one for each value. Check the results and fine-tune the formulae. Once you are satisfied, import the data into its final destination.