12 Replies Latest reply on Jun 9, 2015 7:43 AM by annoici

    Census database

    annoici

      Hi

       

      My first question here.

       

      The Irish census of 1901 and 1911 are both online. I have created two small databases of my local area which have 810 and 729 entries respectively.

       

      I then thought it would be interesting to try and track people who may have been recorded on both censuses. This is, alas, not just a question of adding ten years to 1901 as people were sometimes unaware of their birth date, sometime lied. In fact one woman was one year younger in 1911 than 1901.

       

      Obviously I have a primary key for each census. Currently this is just 1 to 810 and 1 to 729 in each case. My conundrum is how do I create a satisfactory key for people I discover to be present in both censuses? I guess something akin to a social security number which would be constant throughout ones life. I did think that I might be able to use the UID function but I was unable to get it to work as required.

       

      Any suggestions welcomed.

       

      Brendan

        • 1. Re: Census database
          mark_baum

          Hi Brendan,

           

          I think this question will get more attention in the main Discussion area, so after I respond I'm going to move it there.

           

          The Get(UUID) function generates an arbitrary ID, similar to a serial number but much less likely to be duplicated. I don't see the value of using it for the purpose you describe, though maybe I'm misunderstanding your goal. As you noted, it would be a lot more useful to have a Social Security Number assigned by the government.

           

          Not knowing more about your data, this is probably what I would try. Maybe others will have better suggestions:

           

          1a) Create a foreign key in your 1911 table to store possible matches from the 1901 table (or vice versa).  So if you have a field "pk_person_1901" in your 1901 table, then create a new field in your 1911 called "fk_person_1901".

           

          1b) When you find a match between the two tables, you'll copy the primary key value from the 1901 table and store it in the foreign key field in the 1911 table.

           

          2) Look for natural keys (even if imperfect) between the two tables -- last name if you have nothing better --  and create relationships based on these to explore your data for duplicates. When you find true duplicates, then you'll do step 1b above.

           


          I noticed that you said you have two databases. Given that you are working with census data from two different years, you have several options here for how you store the data:

           

          A) Store the data in separate database files (which it sounds like you've done). This is generally where people start if they've converted Excel files to FileMaker database files.

           

          B) Store the data in two separate tables within a single database file.

           

          C) Store the data in the same table, but have a "CensusYear" field to tell the records apart.

           

           

          Option C will work best if the census data points are almost the same. If not, your table will have some fields that only contain data from the 1901 census, and other fields at only contain data from the 1911 census.

           

          Hope this helps --

           

          Mark

          • 2. Re: Census database
            siplus

            can you give us a list of all the fields out of which an entry in the available census(es) is made of ?

            • 3. Re: Census database
              annoici

              Hi Mark,

               

              Thank you for your reply.

               

              At present, I have the data stored as per your Option B, ie separate tables within a single database.

               

              However, as the questions asked are exactly the same (except 1911 contains 3 extra questions) I did also play around already with your Option C.

               

              I seem to have some kind of conceptual mental block about this Foreign Key idea.

               

              > 1b) When you find a match between the two tables, you'll copy the primary key value from the 1901 table and store it in the foreign key field in the 1911 table.


              For the sake of argument let us look at the mythical Mary McCarthy, who I discover by co-relating both censuses is present on both.

               

              Mary McCarthy 1901 P_ID 56

              Mary McCarthy 1911 P_ID 77 (so I will store fk_person_1901 P_ID 56)


              There are ± 320 people who appear in both censuses so I will place their 1901 P_ID in the foreign key field in 1911 fk_person_1901 but what do I put the field when there is no match?


              Can a Foreign Key field contain a 'Null' value?


              Regards

              Brendan

              • 4. Re: Census database
                annoici

                Hi Siplus,

                 

                I must apologise but I am afraid I do not understand your question exactly.

                 

                Is this what your require?

                 

                The 1901 census has fields such as Surname, Forename, Age, Townland, Literacy, Religion, Birthplace, SpecifiedIllnesses, Relation to head of household.

                 

                The 1911 census has exactly the same fields as 1901 with the addition of 3 extra question Number of Years Married, Number of Children Born, Number of Children Still Living.


                If I am not giving you what it is you need you can look at

                National Archives: Census of Ireland 1911


                Regards

                Brendan




                • 5. Re: Census database
                  mark_baum

                  You've got it right. When there is no match, there is nothing to do -- you just leave the foreign key field empty.


                  I'm not sure what you've called your tables, but I'll call them Census1901 and Census1911.


                  The purpose of this data is to support a relationship between a table occurrence for Census1901 and a table occurrence for Census1911, where:


                  Census1901::P_ID = Census1911::[foreign key  field name]


                  Note that while people have a variety naming conventions, here's a typical approach to naming primary and foreign keys, where the table name is included in the field name:

                   

                  In Census1901:

                  pk_census1901_id

                   

                  In Census1911:

                  pk_census1911_id

                  fk_census1901_id

                  • 6. Re: Census database
                    mark_baum

                    Hi Brendan,

                     

                    My guess is that Siplus wants to help you identify fields that would be useful for finding matches between your two census tables. It would be thoughtful to give him a full list of the fieldnames since that would help him to advise you. :-)

                     

                    Cheers --

                     

                    Mark

                    • 7. Re: Census database
                      ariley

                      If you do not have an identifier such as a personal ID number you might want to combine, first, last and date of birth if you happen to have it (or you can calculate it back from their age, but that's never perfectly accurate). Or at least place of birth. It is still a hard task to match people, because they get married and change surnames. So really, the only things that don't change is their mother's name and birth date and place.

                       

                      So if you have nothing to tie the people together in two different census databases you will never have a perfect match. So treat this with caution.

                       

                      Hope this helps,
                      Agi

                      • 8. Re: Census database
                        siplus

                        I agree with Agnes - my motto is that bad data is worse than no data.

                         

                        Jane Doe which was 11 in 1901 could have in fact been 9 yrs old and unable to correctly spell her name, and Janet Smith 21 years old in 1911 could be her, after marriage, or could be a totally different person, as can be Jane Doe claiming to be 20 in 1911.

                         

                        Only a research on field can establish "sure pairs" - (x1;x2) with X2 being the same as X1, 10 years later.

                        After having these certain pairs and hashing them out from the databases you can begin to speculate on the remaining, but at least you divided your data into sure and unsure.

                        • 9. Re: Census database
                          mark_baum

                          I agree as well and want to clarify that my suggestion of using various natural keys was only for research purposes. I like to use "loose" matches to narrow things down, then I eyeball the data to flag those records that I suspect are true duplicates.

                           

                          In this case, you may have to do additional research beyond the data in your tables... it all depends on what you have to work with. But it's better to risk duplicates than to match records together that shouldn't be matched.

                          • 10. Re: Census database
                            annoici

                            Hi

                             

                            Thank you (and Siplus).

                             

                            I probably put the cart before the horse. I did an MA thesis (as a very mature student last year) one element of which was a painstaking analysis of both sets of data, so my assertion that a person is a member of 'both'  the 1901 and 1911 census is based on research.

                             

                            I am now learning a bit about databases and so I am retrospectively trying to put that research into a more useable form via Filemaker.

                             

                            Regards

                            Brendan

                            • 11. Re: Census database
                              mark_baum

                              That changes everything -- very helpful to know. What form does this research take? Is it a list of pairs of IDs -- or a list of pairs of names -- or something else?

                               

                              Perhaps you could establish a relationship that allows creation of related records (on the foreign key side of the relationship -- i.e. the 1911 census). Then you could find those 1901 census records where you know there is a match and create a simple interface for choosing the 1911 record you want to match with.

                              • 12. Re: Census database
                                annoici

                                Hi

                                 

                                I used Surname, Forename and Townland (a small geographical division of land used in Ireland). As there was very little population migration in Ireland at the time, unlike, say the US, then this latter field is a very good one to co-relate the population over the ten year period.

                                 

                                I then did a fudge (knowing nothing about databases then). Using a programme called MAMP I learned some basic MySQL and created file there which had entities 1901Only, 1911Only and Both (for those people who I could 'prove' were present in both 1901 and 1911).

                                 

                                As it transpires this made up method is not a million miles away from what I am now doing in FM. I did not know about Foreign Keys and suchlike at the time.

                                 

                                 

                                Regards

                                Brendan