AnsweredAssumed Answered

Nightmare Key Field

Question asked by AllegroDataSolutions on Mar 28, 2012
Latest reply on Mar 29, 2012 by RayCologon

I am currently building a database for a nonprofit organization that has been hired by State and federal agencies to distribute benefits to participants under several different programs. The format of the data that is coming into the ogranization from these sources and has to be imported into the FileMaker solution is an absolute nightmare. The agencies will not alter their format, so I have had to script the imports into a temporay table, manipulate the data there, then import it into the "real" tables that have been designed as per the client's specs.


I have gotten everything to work (i.e. all the data can be successfully extracted from the original fields and parsed into the tables where the client wants it) but I am not happy with the performance in one area. The source is a huge spreadsheet (several thousand rows) where the first column contains what appears to be data from several different fields in the original database and is over 100 characters long. That field contains the primary key - which has letters, spaces and numbers, and varies in length. The surrounding text also varies in lenth and is inconsistently entered. So the key field can appear virtually anywhere in the line. I am currently extracting the key with a subscript that loops through a value list containing all possible keys and tests for them using PatternCount(). If the result = 1, the value tested is inserted into the field.


While the above approach works, running the script is time consuming. I am wondering if (a) there is a way to get the same result with a formula and (b) if the formula would actually be any faster. Here's a brief sample of the type of values I have to work with (I am not using actual records in my example because the data is confidential):


"PROGRAM NAME:JSm 001 John Smith $22,365: JSm 18 Line Item 18 Name $450"

"PROGRAM NAME:JSm 001 John Smith $8,000: JSm 34 Line Item 34 Name $55"

"COMPLETELY DIFFERENT PROGRAM NAME:JDoe 001 Jane Doe 22,365: JDoe 10 Line Item Name 425"

"02 ANOTHER BUDGET LINE ITEM: ABChar 001 Able Baker-Charlie $500: ABChar 66 Reimbursed expenses $50"


In these examples, the value following the second colon is what we want:


JSm 18

JSm 34

JDoe 10

ABChar 66


About the only consistancy in the source material is that there are two colons in the rows we are importing (though they can appear in different places in the field).


I'd be interested to see if anyone can come up with a formula that works -- and, if using it as an auto-enter calc is any faster than running the import sub script.