9 Replies Latest reply on Dec 20, 2010 10:06 AM by philmodjunk

    complicated import series - do I need to move records between tables?

    tmlutas

      Title

      complicated import series - do I need to move records between tables?

      Post

      I have a situation where I have dozens of import files, all subsets of a larger database (voter records). I cannot simply import the full database. I only have access to slices of the larger datasets for various legal reasons. The task I need to do is to import a list, let's call it 2010 general election, mark all the records as having voted in the 2010 general election and then import the 2010 primary file, mark all the records as having voted in the 2010 primary, etc, etc back for a decade. Then keep on doing imports for various affiliations (left handed pipe fitters, etc) all of which have to not overwrite the existing status information. Oh, and it's all got to be doable by your average local politician who gets access to these sorts of data sets and guards them jealousy (in other words the solution has to be fully scripted to add new types of status stuff on the fly via the GUI inside a runtime). Searches might be along the lines of left handed pipe fitters who are not catholics who voted in the 2007 GOP primary.

      The solution I'm thinking about is to have an import table, run my import, add the status bits that need to be added on existing records and move new records over to the actual records database. Is there a better way?

        • 1. Re: complicated import series - do I need to move records between tables?
          philmodjunk

          What problem are you solving with your import table?

          This seems like an import that can be done directly to the actual table. Immediately after import, the found set will consist of all the newly imported records so this is the perfect time to use Replace Field Contents and other actions to make last minute modifications of your data as part of the import process.

          I am assuming that each import is intended to create a new record in the actual table--not update an existing one.

          • 2. Re: complicated import series - do I need to move records between tables?
            tmlutas

            PhilModJunk - Your assumption that new records are being created is incorrect. Most of the time new records will not be created. Let's take a hypothetical John Smith. He gets loaded in with the initial load of "current registered voters" that is public record. Then when I get a listing of 2010 primary voters, he gets loaded again and now I need to match him, name and address, and create some status bits that say that he's a 2010 primary voter. Now when I get around to the 2009 primary lo and behold he's in it again.

            And that's where it gets complicated because his wife Jane Smith did vote in the 2009 primary but not in the 2010 one. Now how does one fix the status fields with Replace Field Contents? I don't think you can. A different solution is called for where you look up the status and modify it to include the new information or you create a status table and create related records for each member of the import and then do all your searches on the status table.

            Wait a second, did I just solve my own question? *conditional facepalm*

            While I'm here, I'm looking at a county of about 175,000 voters each of which is likely to have dozens of status entries. How big can that status table get before it gets unwieldy?

            • 3. Re: complicated import series - do I need to move records between tables?
              philmodjunk

              Now how does one fix the status fields with Replace Field Contents?

              This is just one possible tool for updating a field for an entire group of records in one pass. One option that this tool has is to use a cacluation to determine the value for each record so Replace Field contents can put a different value in each field as needed.

              When it comes to size of data sets such as your Status table, FileMaker does have its limits. It's impossible for me to tell from what you've described as to how big is "too big".

              Matching your imported data to existing data is likely to be very problematic if all you have are names and addresses with which to do the matching. People move, people change their names, names get entered with typos and/or alternate spellings and names are not unique. Do you have a voter registration number? That would seem to be a better option for matching the records up.

              • 4. Re: complicated import series - do I need to move records between tables?
                tmlutas

                PhilModJunk - I don't think that you responded to my correction that most of the time I'm going to be updating, not adding voter records. When you're going to be updating 90%+ of the time and you are unsure whether the existing or the imported version has better data quality, does it make sense to do a direct import? If not, then we're back to what's the best method to move the data after you've imported and cleaned it up.

                As for the prospective size, let's say I have 175k voters and there are going to be an average of 10 status entries per voter. That's 1.75m records of a status code (maybe 4 digits) and a voter UID. Now I looked up the tech specs but it didn't make sense "Number of records per table: 64 quadrillion total records over life time of file." What are they talking about over life time of file?

                On data matching, the national party's doing most of the heavy lifting on this and there are actually 3 UIDs in the existing 69 field data record coming from different sources. But that's no guarantee that I'm not going to get data without a UID. I can't rely on it being present.

                • 5. Re: complicated import series - do I need to move records between tables?
                  philmodjunk

                  I didn't respond to that because I agreed with you. Wink

                  It can be very useful to import data into such an import table--especially if any human inspection/intervention is needed prior to updating the records in the main data table.

                  What are they talking about over life time of file?
                  I think this means the number of records that can be created in one table and whether to keep all of them or delete most of them does not affect this limit. It really doesn't tell you much to answer your question. There are many factors that will limit the usability of your table long before you reach this limit. This includes both hardware and database design considerations. Obviously, if you are going to be generating a very large file, you'll want fast processors and a fast hard drive array. Less obvious and harder to quantify is how the design of the table and what operations you need to perform on it will determine the practical limits of your system. If you have just a few fields in each status record and no unstored calculaitons and summary fields, you can get pretty good response times even with a very large number of records. If you find you need to perform a sort or find that refers to an unindexed or unstored field, things get very slow long before you reach this limit.

                  You may want to have a plan B in mind for moving your data into a different system where FileMaker continues to serve as the front end if you find your system has grown to the point where FileMaker alone no longer cuts the mustard.  This enables you to go forward with your design without getting boxed in at a future time.

                  Any imported data without a reliable ID that you can use to match to existing data will be a major challenge for you to work with. I think you'll need to be prepared to spend considerable time analyzing the quality of your imported data and devising scripts that help you find and fix problem records in order to keep your data as reliable as possible.

                  • 6. Re: complicated import series - do I need to move records between tables?
                    tmlutas

                    Ok, we agree on the need to move records from an import table to a main table. Now how does one do that? Every thread in the forums seems to be focused on trying as hard as possible not to have to do those sorts of moves and I haven't really seen a good, flexible solution that is generic.

                    I know that you should be able to request all the field names in the two tables and then compare the two. It should be possible to programmatically make sure that all fields in the import table that do not exist in the main table are created. It should be possible to move things over, perhaps with a filemaker import? I'm not sure whether you can import within a FM database.

                    • 7. Re: complicated import series - do I need to move records between tables?
                      philmodjunk

                      Have you read up on Import Records in FileMaker Help? this is the starting point for what you describe.

                      Import records can move any number of records from quite a few different file formats including tab, csv, excel as well as Filemaker.

                      What you can't do is this: It should be possible to programmatically make sure that all fields in the import table that do not exist in the main table are created.

                      Perhaps that's a mis-statement as I don't see why you need to do that. You can certainly use scripts, caclulations, relationships etc. to selectively move data from fields in one table into fields of a matching record in another table. And yes, Import Records can be used to move records from one FileMaker table to another within the same file.

                      • 8. Re: complicated import series - do I need to move records between tables?
                        tmlutas

                        I've reworked how I'm going to do things. What's going to happen is this.

                        1 table for imports

                        1 table for main database

                        1 table for status types

                        1 table for status items, what I call affiliations

                        The import and main database have the same number of fields.

                        I'm looking to import to the import table, verify that everything has a matched record in the main database and then add records to the affiliations table, one for each record in imports, and then delete the record in imports.

                        Affiliations would consist of two fields, a status type ID (with a 1 (status type) to many (affiliations) relationship) and a voter ID (with a 1 (voter table) to many (affiliations) relationship).

                        Is this going to work?

                        • 9. Re: complicated import series - do I need to move records between tables?
                          philmodjunk

                          Don't see why not. I'm assuming one record in the main database represents one voter.

                          Main----<statusItems

                          Main::VoterID = StatusItems::VoterID

                          It seems possible that you might already have a status Item record that then appears repeatedly in the Imports table, so I'd include code that checks for the existance of a status item before creating it so you don't get duplicates.