1 2 Previous Next 16 Replies Latest reply on Feb 8, 2012 7:11 PM by jonathanschoel

    Import/parse tab-delimited csv fields in quotes?

    jonathanschoel

      Thanks in advance for your help!

       

      I have a tab txt file and the data is encapsulated with quotes (""). So it goes "Data" (TAB) "Data" (TAB) "Data"

      The file contains the profile information of people who are applying for jobs so the data coming over is their contact, work history, resumes, etc. The resumes, with all of the extra spaces, were the problem (which is the reason for the quotes). My test file (attached) contains 6 dummy records. Without the quotes, FM imported the data and created over 700 records - a new record for every line of each resume. When I removed the resumes from the file the import worked great - but I need the resumes! Hence the quotes.

       

      I'll be getting a new file on a recurring basis but I'm stuck on how I can tell the system that the data in the first quote goes to the NameFirst filed and the data in the second quote goes to the NameLast field, etc. I'm sure there is a way to script the system to "scrub" the data, maybe even reduce all of the spaces in the resume to single spacing (formating isn't as important as importing!) before import but the spacing isn't as much of an issue as the system knowing all of the resume data between the quotes goes into the ResumeText field.

       

      Argh. My brain is stuck. Help?

        • 1. Re: Import/parse tab-delimited csv fields in quotes?
          comment

          Can it be assumed that the first line, enumerating the field names, will be the same in every file you import?

           

           

          BTW, why won't your data supplier use one of the standard formats, (e.g. tab-delimited or comma-separated or xml) instead of putting you through all this extra work?

          • 2. Re: Import/parse tab-delimited csv fields in quotes?
            jonathanschoel

            Michael,

            Yes, the first line will always be the same for each file.

             

            We originally tried all 3 of the methods you mentioned.

            1. Their XML generation was incompatible with FM, several days and iterations failed so that was out.
            2. We tried comma-seperated but everyone uses commas in their resume, i.e., I graduated from the University of Colorado, Boulder so "University of Colorado" would populate the "School" field and "Boulder" would populate the "ExperienceYears" field (next field). When it comes to contact/personal info commas are everywhere.
            3. That left tab but again, LOTS of spaces in the resumes. A tab is the same as about 5 spaces and the system continually tripped on them.

             

            Which left us with tab-delimited but now enclosing the data with quotes. I'm just stuck on how to write a custom import script where I can match up fields...

             

            Thanks for the response!

            • 3. Re: Import/parse tab-delimited csv fields in quotes?
              comment

              jonathanschoel wrote:

               

              Their XML generation was incompatible with FM, several days and iterations failed so that was out.

               

              Practically anyone's XML will be incompatible with Filemaker's. This is easily solved by using a XSLT stylesheet during the import to convert between the schemas.

               

               

              jonathanschoel wrote:

               

              We tried comma-seperated but everyone uses commas in their resume,

               

              Yes, so? The comma-separated format is equipped to handle those - see:

              http://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules

               

               

              I would strongly urge you to solve this properly at the source, instead of throwing your needles into a haystack and trying to find them later.

              • 4. Re: Import/parse tab-delimited csv fields in quotes?
                jonathanschoel

                Wow Michael, I'm not sure I understand that last sentence and I'm kindof confused by the tone. I'm not able to manipulate "the source," I get what I get and I'm just trying to make it work.

                 

                Regarding XML:

                I don't know HOW to make a XSLT stylesheet and after several hours of looking I couldn't find anything to explain it to me and even though I've read in various resouce/training books there are templates on FM's main site I sure couldn't find them. What I did find, however, was a notice that FM will no longer supprt XSLT in the next full version. I don't want to build a solution that will not be supported in FMP/FMS 12.

                I mentioned above that we explored this option quite a bit before moving on...

                 

                Regarding comma-seperated:

                Have you even looked at the file I attached? The first csv version is exactly the same as the attached only without the quotes. Take a peek then re-read the link you posted:

                 

                "Basic rules

                The basic rules from a lot of these specifications are as follows:

                CSV is a delimited data format that has fields/columns separated by the comma character and records/rows terminated by newlines. Fields that contain a special character (comma, newline, or double quote), must be enclosed in double quotes. If a line contains a single entry which is the empty string, it may be enclosed in double quotes. If a field's value contains a double quote character it is escaped by placing another double quote character next to it. The CSV file format does not require a specific character encoding, byte order, or line terminator format.

                • Each record is one line terminated by a line feed (ASCII/LF=0x0A) or a carriage return and line feed pair (ASCII/CRLF=0x0D 0x0A), however, line-breaks can be embedded.
                • Fields are separated by commas. (In locales where the comma is used as a decimal separator, the semicolon is used instead as a delimiter. The different delimiters cause problems when CSV files are exchanged, for example, between France and USA.)"

                 

                Again, I don't have control over the source and the data either comes over as all comma-seperated or all quotes encapsulated. I don't have the ability to tell them to enclose certain info in quotes or double-quotes and to leave other data alone. There are "newlines" everywhere in the data - especially in the resume section. There are carriage returns everywhere in the resumes. Each record is NOT terminated by a new line in the file(s) provided. Hence the 749 new records in my solution when importing only 6 applicants!

                 

                I'm not sure about the reference of the needles and haystack but if you're implying I haven't done some research before coming to the forum and asking for help then you're mistaken. I'm soaking up the knowledge here like a sponge and certainly appreciate this incredible resource. I've also searched the forums before posting my question. I'm genuinely looking for the best solution by asking people who obviously have more experience than I and I highly value their/your input.

                 

                I need ideas on how I can scrub what lands on my desk to properly import into my solution. I'm still not sure how to do it but thanks for your suggestions.

                • 5. Re: Import/parse tab-delimited csv fields in quotes?
                  beverly

                  Jonathan, I've looked at your example file. The problem is that it's NOT all exactly "data"TAB"data"TAB"data"...RETURN per "row/record".

                   

                  While they have put quotes around all the "data", to preserve returns "in field/column", they have NOT escaped the double quotes that may be inside the "data".

                   

                  Because you don't know what the real delimiters are and what are not, they will not import and they will be difficult to "parse".

                   

                  When I changed the TAB to a comma (and changed the double quotes inside every "data" - to single quotes), then I was able to import directly. This is the format of a CSV file. Returns-in-field will be preserve on import if they are within the double quote. But any double quotes must be altered to import correctly.

                   

                  HTH,

                  Beverly

                  • 6. Re: Import/parse tab-delimited csv fields in quotes?
                    beverly

                    Jonathan and all XSLT for use with WEB PUBLISHING (please read the articles carefully!) will not be available.

                     

                    XSLT for use with IMPORT & EXPORTED does still work and should work in future versions. (of course, FMI has the prerogative to change anything...

                     

                    This article on this forum is very misleading! https://fmdev.filemaker.com/docs/DOC-1133

                    When you click the "learn more" link, you go to the real announcement:

                    http://help.filemaker.com/app/answers/detail/a_id/7701

                    "Deprecation of XSLT API for Custom Web Publishing"

                     

                    Notice the "CUSTOM WEB PUBLISHING"?

                     

                    Beverly

                    • 7. Re: Import/parse tab-delimited csv fields in quotes?
                      beverly

                      [quote]Again, I don't have control over the source and the data either comes over as all comma-seperated or all quotes encapsulated. I don't have the ability to tell them to enclose certain info in quotes or double-quotes and to leave other data alone. There are "newlines" everywhere in the data - especially in the resume section. There are carriage returns everywhere in the resumes. Each record is NOT terminated by a new line in the file(s) provided. Hence the 749 new records in my solution when importing only 6 applicants![/quote]

                       

                      Jonathan, read my post, I looked at your file. the double-quotes inside double-quotes are the problem, not the returns. But when they combined TAB and quoted the fields, it was no longer an easy import into FileMaker.

                       

                      On a good note, they don't seem to have TABs in any place except between the "data" fields! With the double-quote inside fields fixed, you may be able to "parse", but again it will be very difficult to "import". Bringing in will be so much work, you may want to open with Excel and export as proper CSV or Tab!

                       

                      Added:

                      Sorry, I just tried to import into Excel and ran into the same problems. You may have to request a change of format, or you may be manually checking.

                       

                      Message was edited by: Beverly Voth

                      1 of 1 people found this helpful
                      • 8. Re: Import/parse tab-delimited csv fields in quotes?
                        jonathanschoel

                        Thank you Beverly! You've illuminated my brain!

                        • 9. Re: Import/parse tab-delimited csv fields in quotes?
                          comment

                          jonathanschoel wrote:

                           

                          Wow Michael, I'm not sure I understand that last sentence and I'm kindof confused by the tone.

                           

                          Please don't be. I am trying to outline what I think will be the path of least resistance for you. It may not be a direct answer to your question, but still...

                           

                           

                          jonathanschoel wrote:

                           

                          Regarding XML:

                          I don't know HOW to make a XSLT stylesheet and after several hours of looking I couldn't find anything to explain it to me

                           

                          Why not ask for help with it, then? As Beverly noted, XML/XSLT import/export is NOT being deprecated. IMHO, of all possible intermediate formats, XML is by far the most preferable one.

                           

                           

                          jonathanschoel wrote:

                           

                          Have you even looked at the file I attached?

                           

                          Yes, I have and I have reached the same conclusion as Beverly did: with only a few changes, Filemaker would be able to import the file as is. FWIW, I am attaching the modified file. But this is all kind of beside the point. You are asking how to handle this automatically: I am not even sure it's possible - but I am quite sure it will take a lot of work if it can be done at all. Hence my reference to needles in a haystack.

                          • 10. Re: Import/parse tab-delimited csv fields in quotes?
                            karendweaver

                            Jonathan

                             

                            You "may" be able to solve this problem with a two-step process.  First, map the fields to import EXCLUDING the Resume Text.  That should give you the 6 records you want.

                             

                            Then import the entire document into a global field in a temporary table with 2 fields - application ID and ResumeText.

                             

                            You can then parse out the application ID and the Resume Text into six records - this will take some fancy scripting, but it can be done.  You may want to do some searching on parsing text.  Here's a few places to check:

                             

                            http://www.atpm.com/12.05/filemaking.shtml

                            http://www.youtube.com/watch?v=qtQB7CbvJOI

                             

                             

                            Then you can update the "real" record by matching application ID and importing or using set field to replace the contents of Resume text with the extra text.

                             

                            Hope that helps!

                             

                            Karen

                            1 of 1 people found this helpful
                            • 11. Re: Import/parse tab-delimited csv fields in quotes?
                              comment

                              karendweaver wrote:

                               

                              First, map the fields to import EXCLUDING the Resume Text.  That should give you the 6 records you want.

                               

                              I don't see how dropping the field from the import will help - Filemaker still looks at it when it tries to parse the file.

                              • 12. Re: Import/parse tab-delimited csv fields in quotes?
                                karendweaver

                                Of course you are right - I was REALLY unclear.  There are several ways to go about it - depending on how many of these he is going to have to process and how frequently he will get them and how many applicants are in each file.

                                 

                                If he is only going to need to do it occasionally, or there are not many applicants per file, as in the example, I would probably just make a copy of the import file with the resume text deleted for the first import, then a second import of the whole file parsing out the resume text.

                                 

                                Or - I could import the entire file and delete the "garbage" records so I was left with the 6 records I wanted

                                 

                                Or - I could import the entire file and create a looping script that captures the individual record data, including the resume text - but that is a more complex script for the parsing. 

                                 

                                If there are hundreds of applicants in one text file - probably the second or third choice would be a better method.  But if there are only a few each time - as in the sample file - then it may be easier to just do it in two steps, especially if Jonathan is not familiar with parsing text and writing scripts.

                                 

                                And I am sure that someone will have a better way of doing this task!

                                 

                                Karen

                                • 13. Re: Import/parse tab-delimited csv fields in quotes?
                                  comment

                                  karendweaver wrote:

                                   

                                  I could import the entire file and delete the "garbage" records so I was left with the 6 records I wanted

                                   

                                  IIUC, there are no "garbage" records in the example file. The problem is that the format chosen is inconsistent regarding the delimiters used. It doesn't matter which method you choose, you still need to come up with an algorithm that will reliable parse the text into records and fields (unless you intend to do it manually).

                                  • 14. Re: Import/parse tab-delimited csv fields in quotes?
                                    jonathanschoel

                                    Michael,

                                    Gotcha! I get what you were saying now. And yeah, I REALLY misunderstood the whole XML issue. But it's piqued my interest and I think I'll look into those style sheets further. I don't do much web interaction right now, aside from ODBC imports, and I'm looking to expand that skillset.

                                    That said, I really appreciate you jumping into the discussion. One of the best things about TechNet is the discussion and bringing so may viewpoints to the table. FM is great in that there are several angles to every roadblock and it's usually the sharing of different ideas that keep us all on our game and constanly improving, IMHO.

                                    So thanks for helping me out!

                                    1 2 Previous Next