I think this question will get more attention in the main Discussion area, so after I respond I'm going to move it there.
The Get(UUID) function generates an arbitrary ID, similar to a serial number but much less likely to be duplicated. I don't see the value of using it for the purpose you describe, though maybe I'm misunderstanding your goal. As you noted, it would be a lot more useful to have a Social Security Number assigned by the government.
Not knowing more about your data, this is probably what I would try. Maybe others will have better suggestions:
1a) Create a foreign key in your 1911 table to store possible matches from the 1901 table (or vice versa). So if you have a field "pk_person_1901" in your 1901 table, then create a new field in your 1911 called "fk_person_1901".
1b) When you find a match between the two tables, you'll copy the primary key value from the 1901 table and store it in the foreign key field in the 1911 table.
2) Look for natural keys (even if imperfect) between the two tables -- last name if you have nothing better -- and create relationships based on these to explore your data for duplicates. When you find true duplicates, then you'll do step 1b above.
I noticed that you said you have two databases. Given that you are working with census data from two different years, you have several options here for how you store the data:
A) Store the data in separate database files (which it sounds like you've done). This is generally where people start if they've converted Excel files to FileMaker database files.
B) Store the data in two separate tables within a single database file.
C) Store the data in the same table, but have a "CensusYear" field to tell the records apart.
Option C will work best if the census data points are almost the same. If not, your table will have some fields that only contain data from the 1901 census, and other fields at only contain data from the 1911 census.
Hope this helps --
can you give us a list of all the fields out of which an entry in the available census(es) is made of ?
Thank you for your reply.
At present, I have the data stored as per your Option B, ie separate tables within a single database.
However, as the questions asked are exactly the same (except 1911 contains 3 extra questions) I did also play around already with your Option C.
I seem to have some kind of conceptual mental block about this Foreign Key idea.
> 1b) When you find a match between the two tables, you'll copy the primary key value from the 1901 table and store it in the foreign key field in the 1911 table.
For the sake of argument let us look at the mythical Mary McCarthy, who I discover by co-relating both censuses is present on both.
Mary McCarthy 1901 P_ID 56
Mary McCarthy 1911 P_ID 77 (so I will store fk_person_1901 P_ID 56)
There are ± 320 people who appear in both censuses so I will place their 1901 P_ID in the foreign key field in 1911 fk_person_1901 but what do I put the field when there is no match?
Can a Foreign Key field contain a 'Null' value?
I must apologise but I am afraid I do not understand your question exactly.
Is this what your require?
The 1901 census has fields such as Surname, Forename, Age, Townland, Literacy, Religion, Birthplace, SpecifiedIllnesses, Relation to head of household.
The 1911 census has exactly the same fields as 1901 with the addition of 3 extra question Number of Years Married, Number of Children Born, Number of Children Still Living.
If I am not giving you what it is you need you can look at
You've got it right. When there is no match, there is nothing to do -- you just leave the foreign key field empty.
I'm not sure what you've called your tables, but I'll call them Census1901 and Census1911.
The purpose of this data is to support a relationship between a table occurrence for Census1901 and a table occurrence for Census1911, where:
Census1901::P_ID = Census1911::[foreign key field name]
Note that while people have a variety naming conventions, here's a typical approach to naming primary and foreign keys, where the table name is included in the field name:
My guess is that Siplus wants to help you identify fields that would be useful for finding matches between your two census tables. It would be thoughtful to give him a full list of the fieldnames since that would help him to advise you. :-)
If you do not have an identifier such as a personal ID number you might want to combine, first, last and date of birth if you happen to have it (or you can calculate it back from their age, but that's never perfectly accurate). Or at least place of birth. It is still a hard task to match people, because they get married and change surnames. So really, the only things that don't change is their mother's name and birth date and place.
So if you have nothing to tie the people together in two different census databases you will never have a perfect match. So treat this with caution.
Hope this helps,
I agree with Agnes - my motto is that bad data is worse than no data.
Jane Doe which was 11 in 1901 could have in fact been 9 yrs old and unable to correctly spell her name, and Janet Smith 21 years old in 1911 could be her, after marriage, or could be a totally different person, as can be Jane Doe claiming to be 20 in 1911.
Only a research on field can establish "sure pairs" - (x1;x2) with X2 being the same as X1, 10 years later.
After having these certain pairs and hashing them out from the databases you can begin to speculate on the remaining, but at least you divided your data into sure and unsure.
I agree as well and want to clarify that my suggestion of using various natural keys was only for research purposes. I like to use "loose" matches to narrow things down, then I eyeball the data to flag those records that I suspect are true duplicates.
In this case, you may have to do additional research beyond the data in your tables... it all depends on what you have to work with. But it's better to risk duplicates than to match records together that shouldn't be matched.
Thank you (and Siplus).
I probably put the cart before the horse. I did an MA thesis (as a very mature student last year) one element of which was a painstaking analysis of both sets of data, so my assertion that a person is a member of 'both' the 1901 and 1911 census is based on research.
I am now learning a bit about databases and so I am retrospectively trying to put that research into a more useable form via Filemaker.
That changes everything -- very helpful to know. What form does this research take? Is it a list of pairs of IDs -- or a list of pairs of names -- or something else?
Perhaps you could establish a relationship that allows creation of related records (on the foreign key side of the relationship -- i.e. the 1911 census). Then you could find those 1901 census records where you know there is a match and create a simple interface for choosing the 1911 record you want to match with.
I used Surname, Forename and Townland (a small geographical division of land used in Ireland). As there was very little population migration in Ireland at the time, unlike, say the US, then this latter field is a very good one to co-relate the population over the ten year period.
I then did a fudge (knowing nothing about databases then). Using a programme called MAMP I learned some basic MySQL and created file there which had entities 1901Only, 1911Only and Both (for those people who I could 'prove' were present in both 1901 and 1911).
As it transpires this made up method is not a million miles away from what I am now doing in FM. I did not know about Foreign Keys and suchlike at the time.