8 Replies Latest reply on Aug 16, 2012 9:54 AM by dreed

    First post here.  Import of Folder of Text Files

    dreed

      Hello,

       

      I've been away from Filemaker for a while, but recently had reason to use it for processing some text based files. The files are .sdf files which are text files containing the atomic coordinates of 2D chemical structures along with a bunch of metadata fields about the molecule. The file may contain any number of individual structure records, delimited by a line containing only "$$$$"

       

      My task is to split these records into individual filemaker records and then parse out the desired metadata and structure data into individual fields. I've pretty much got everything working, except that I now realize that when using the Folder Import to get the whole contents of the file into a temp field for processing, Filemaker is doubling all the return/line feed characters. I tried all the encoding options in the import dialog with no difference in result. The extra returns really only matter when further using the structure coordinates with other programs that interpret this as .mol text. The extra return characters don't work there.

       

      So I see two options to fix this problem:

       

      1. Copy and paste the text contents into the temp field rather than using Import Folder (inelegant because I would like to just have a drop folder to pull new files from)

      2. Get rid of the doubled return characters after import, either in the temp field, or in the extracted .mol text. I can't figure out the second option as my attempt to strip using

       

      Substitute ( FieldName; "¶¶"; "¶")

       

      didn't work. I'm not sure how to identify the pattern of two returns in a row. Patterncount(field;"¶") finds all the returns, but Patterncount (field;"¶¶") finds nothing.

       

      I suppose a third option is to find another way to pull in the text files that doesn't use import folder or copy/paste, like a plugin, but I'd rather leave that option for last...

       

      The file is attached here, thanks for looking!

        • 1. Re: First post here.  Import of Folder of Text Files
          richardsrussell

          This is a straw you may clutch at. I haven't looked at your sample file, so I have no idea whether it's true, but I remember from back in the early days of computing, when I was still paying attention to stuff like ASCII coding, I wondered why there were 2 separate characters for "carriage return" and "line feed". I think that FileMaker's "" symbol corresponds only to CR, and perhaps your file has been adopting invisible LFs.

          • 2. Re: First post here.  Import of Folder of Text Files
            dreed

            Hi Richard, thanks for the reply.

             

            so unless I know how to represent the LF in the calculation, I can't really deal with them.

             

            The original text files certainly have both CR and LF at the end of each line.  This is true for the SD files and for any random text file I create with NotePad on Windows.  The text imported into the field, however, has 2 CRLF sequences everywhere there used to be just one.  This is true for both .sdf files and any text file I try to import. 

            • 3. Re: First post here.  Import of Folder of Text Files
              PalmDBS

              CRLF is equivalent to "char(13) & char(10)", so you could search for "char(13) & char(10) & char(13) & char(10)"

              • 4. Re: First post here.  Import of Folder of Text Files
                dreed

                Thanks, that did the trick!  I forgot about the char() function.

                • 5. Re: First post here.  Import of Folder of Text Files
                  sporobolus

                  on 2012-08-09 10:58 dreed wrote

                  2. Get rid of the doubled return characters after import, either in the temp field, or in the extracted .mol text.  I can't figure out the second option as my attempt to strip using

                   

                        Substitute ( FieldName; "¶¶"; "¶")

                   

                  didn't work.  I'm not sure how to identify the pattern of two returns in a row.  Patterncount(field;"¶") finds all the returns, but Patterncount (field;"¶¶") finds nothing.

                   

                  i exported the field contents in your sample (as a best approximation of what

                  your input files look like) and hexdumped it to see your data seems to use CRLF

                  line ends, so the blank lines are represented by the sequence CR LF CR LF (hex

                  0D 0A 0D 0A)

                   

                  if it were me, i'd preprocess the file before importing, but in FileMaker this

                  should work:

                   

                     substitute(substitute (sdf records::sdf text; char(10) & char(13); ¶); "¶¶", ¶)
                  
                  

                   

                  • 6. Re: First post here.  Import of Folder of Text Files
                    Malcolm

                       substitute(substitute (sdf records::sdf text; char(10) & char(13); ¶); "¶¶", ¶)

                     

                     

                    Steve, your sticky note reminder must have fallen off the screen: we don't need to nest substitute functions anymore.

                     

                    substitute (sdf records::sdf text; ; ["¶¶"; ¶] )

                     

                    Malcolm

                    • 7. Re: First post here.  Import of Folder of Text Files
                      sporobolus

                      on 2012-08-09 16:56 Malcolm wrote

                      >>     substitute(substitute (sdf records::sdf text; char(10) & char(13); ¶); "¶¶", ¶)

                      >

                      Steve, your sticky note reminder must have fallen off the screen: we don't need to nest substitute functions anymore.

                       

                      substitute (sdf records::sdf text;  https://fmdev.filemaker.com/message/90969#90969#90969/char(10) & char(13); ¶ Re: First post here.  Import of Folder of Text Files; ["¶¶"; ¶] )

                       

                      can't really see your intended code because stupid Jive mangled your message,

                      but as far as sticky notes, i was too busy getting char() right without being

                      able to test it to remember that, since this machine is still using FMP9; i

                      also got char(10) and char(13) reversed!

                       

                      i think you meant this (with my bug fixed too), which is good to point out:

                       

                      substitute (sdf records::sdf text; [ char(13) & char(10); ¶ ];  [ "¶¶"; ¶ ])
                      

                       

                      (note i'm using Jive's code tags to get past the email mangling)

                       

                      anyhow i just recently attended a Scala workshop … if you want a language where

                      you can always find a terser way to do things, so much it'll blow your mind,

                      that's the one for you

                       

                      also worth noting that FMI hasn't updated its own doco on the subject, >teehee<:

                       

                      <http://help.filemaker.com/app/answers/detail/a_id/3228/~/using-the-substitute-function-to-replace-more-than-one-text-string-in-a-single

                      • 8. Re: First post here.  Import of Folder of Text Files
                        dreed

                        Thanks to all who helped!  I ended up having two different fields, one that stripped the extra return characters, which was useful for copy/pasting the .mol text directly into the drawing application for rendering, and one that left them in.  The one that left them in worked best for using "Export Field Contents" to output the results to a batch of files.  This script step outputs a text file that appears to remove the extra return characters as well.

                         

                        The final trick was getting around the fact that Export Field Contents creates a UTF-16 file, and I need ASCII for the purpose of the applications which expect standard .mol format.  I tried isolating a single record in the found set, and exporting the single field as tab, csv etc.  But the return characters in the field aren't correct then.  I also tried xml export with an xslt modified from the example ones, but although the file looked right, it was still UTF-8, and not recognized.  Finally I decided to export field contents and post-process the data using the windows command shell (TYPE command) I redirected its output to a file and then deleted the original.

                         

                        It works OK, but since the cmd shell is being called ~2000 times on each loop iteration there's a fair bit of flashing in the windows task bar.  I supposed I could create one batch file that iterated all the files for post-processing after the loop export...

                         

                        Thanks again,

                         

                        Dana