2 Replies Latest reply on Feb 23, 2017 12:01 AM by siplus

    How To Parse E-Mail Addresses with Punctuation


      I'm trying to import into a FileMaker database data that had originally been stored in a spreadsheet and was exported as comma-separated values. Of course, the incoming info was stored in concatenated form (as in Name = "John Paul Jones"), not in component form (as in First = "John", Middle = "Paul", and Family = "Jones"), so I'm having to do a certain amount of parsing to get it properly distributed into the appropriate fields.


      The thing that's giving me the most grief is the Eddress (e-mail address) column. Most of the records have just a single eddress, but a non-trivial number of them have 2 (and a very few have 3). And they weren't stored consistently. Some are separated from each other by commas, some by semicolons, and some by the word "or".


      I've tried to use FileMaker's various "Word" functions to get a handle on where one word ends and the next begins, but a brief perusal of this set of sample data shows that it's not exactly a simple deal:


      Parsing Eddresses.jpeg


      For example, on Line 2 it appears that "xyz. 123" (with a space after the period) is treated as 2 words, whereas on Line 3 "xyz.123" (no space) is treated as 1. "PatLee" and "Pat.Lee" each appear to be 1 word, but "Pat-Lee" and "Pat_Lee" are 2.


      I expect to be able to Find for commas, semicolons, and spaces to identify records with multiple eddresses in them, but thereafter I'm looking for ways to automate the parsing process to tell where one of them ends and the next one begins. Any clever ideas?