I've been able to solve this with a Loop in a script but it is PAINFULLY slow as there are over 9 million records. There has to be a more efficient way, please don't say Summary fields either, lol.
please don't say Summary fields either
Why not? What makes that option an undesirable choice here?
You may not want us to say "summary fields" but that can be made to work to count unique values. "sum the reciprocal" is one option for getting a count of unique values for the records in your found set and it requires the use of a combination of summary and calculation fields to do so.
To get an "Or" relationship you can use either a repeating field, a multi-key field or an intermediate "join" table.
Here's a multi-key approach that works with the example shown.
Define cMultiKey as: List ( Field1 ; Field2 ; Field3 )
Now you can define this self join relationship:
YourTable::GlobalSearch = YourTable 2::cMultiKey
and now entering/selecting "Jones" in globalsearch will match to all records with Jones in field1, field2 OR field3. That's not the solution to this issue, however.
Frankly, the fact that "Jones" may be in one of three fields and "wantUnique" isn't unique makes me wonder if the structure of your table is really optimum for what you want to do here. Since you've presented the issue in "abstract", I can't tell for sure...
Thank you so much PhilModJunk! Using a MultiKey with List() has almost completely answered my question. The only problem I have now is when I search on the multikey field for Test for some reason the MultiUniqueCount field doesn't want to recognize CS6 as a unique ID for Test. There must be something wrong with the way I've implemented this so I bet you can point out the problem in seconds as I've added screenshots. Searching on the MultiKey works perfectly though it's only counting the unique records in the WantUnique field that I'm stuck on now.
What's weird is this seems to work searching any of the other names, just not Test (maybe it's a glitch?). Thanks again! I did try and create the orginal join using the GlobalSearch = MultiKey but the result was the same.
To answer your questions: My experience with Summary fields in a database that contains over 9 million records and 850+ fields is that any changes I make to the database ie. adding / deleting fields, etc. Makes the Summary fields recalculate which you can guess how long that takes with just one summary field yet alone several dozen.
To give you a better understanding my database represents baseball play-by-play data from 1948-2011. The fields Field1, Field2, Field3 represent whether the player was playing firstbase, secondbase, thirdbase, etc during the game. The WantUnique field represents a single baseball game, thus a single player could be listed several times for a single game in multiple fields. I've tried to keep this simple with my example though so nothing gets lost in translation and other can easily apply this to their databases.
The above screenshot is the result of searching on Test in the MultiKey field, notice record #8 should have a value of 1 in MultiKeyUnique but it doesn't for some reason.
Below is a screenshot of all the records, notice all MultiKeyUnique values are correct in this case.
This is the defined fields
This is the self-join relationship.
I did indicate that a multi-key would not solve your problem as it doesn't get you to unique counts.
Personally, I'd restructure the data so that instead of separate fields for each baseball position I'd have separate records with a position field that reads "first base", "second base", etc...
You can also use a self join to filter out 2nd and third instances of a given value in a field. When you create a new record, a looked up value setting attempts to copy a value from the existing field from a record with the same combination of values. If such a record exists, the value is copied. If it does not the value is not copied. Then a calculation field inverts this by returning a value if the look up field is empty. Then you can sum or count the values in this last field to get a count of unique values.
Using summary fields will not appreciably slow down your screen updates if you use them carefully. The trick is to never put them on a layout where they are not needed (they do not evaluate unless displayed on a layout) and to not bring up large sets of records on layouts where they are used. If you are only bringing up records for a small number of records at a time (a specific game, a specific player's career, etc) a layout with summary fields should work.
If interested, here's a description of the "sum the reciprocal" method for counting unique values: How to count the number of unique occurences in field.
Restructuring the data isn't even a remote possiblity. How would you seperate the data when each record (over 9 million) in my database contains a single play in a baseball game? So one record could be a double play that involves a batter a pitcher, a second baseman and a thirdbaseman and baserunners, where the ball was hit, inning, outs, home field, etc.
Anyway, I decide to give Summary fields another try and attempted the "sum the reciprocal" method you mentioned for my example database but I am not haivng any success. I must have something wrong, but after triple-checking it looks as I've copied your solution exactly (sorting on the "Key" field from my example leads to this result. I'll post the pic of the fields I added too. What am I doing wrong? Also, you would think that with all the YEARS people have asked "How do you count unique instances of field" questions that there would be a built-in function from FileMaker by now for doing this, sheesh! :-)
Thanks again for your help though. This should be soooo easy, but I'm beginning to think it's not possible with FileMaker.
As you can see above CFraction and SUniqueCount fields don't seem to be adding up correctly.
Here are the newly created fields to implement the "Sum the reciprocal" method.
Restructuring the data isn't even a remote possiblity
I must respectfully disagree. In fact, the more data you have, the more crucial it is to get your data into the correct structure in order to efficiently work with that large mass of data. With an inefficient structure, the efforts to work with your data become increasingly complex and the sheer volume of data involved will bring your system to its knees.
Using a script to restructure your data so that one play comprises multiple records with one record for each position involved in the play is definitely possible to do, though not something to jump into without a lot of careful analysis and planning.
With regards to the sum the recipical method, the records must be sorted to group them by the correct field or they will not return any value. Key does not look like the correct field here since it is an auto-entered serial number--which implies that it will be a unique value for every record.
Before we get too lost in the details here, let's back up and return to the set up and purpose of your actual table. So far, I know that one record = 1 baseball play and that the fields involved identify the field positions involved in the play (correct?). What data is in the "wantUnique" field that you want to count here? What is the purpose of this field and the data stored in it?
I agree about not getting lost in the details so we should probably stick with just the generic example. Once I have a solution for the example I can apply it to the main database. I sense that you want to apply my generic to real world field names but I don't think it's necessary for right now. Since you asked though the WantUnique field represents a single complete baseball game in the real database and is a text field. The same ID will show up in 100+- records, since there are roughly that many plays in a game. So the main goal is counting the total games for a single player for the course of his career.
So if we can just get the unique count of the WantUnique field based on any search at this point in my generic example but eventually a search on the multikey field that is the only goal I'm concerned with right now. I'll figure out the real database later.
Once again I appreciate all your help!
Actually, I wanted to look at the actual purpose, structure of your database so that we can consider more efficient options than what we've discussed so far. The danger to using an abstract example file is that generalizing it to the real table may not produce the optimum result due to my making assumptions about your real data that are incorrect. In this case, I was betting that the want unique field might be a team identifier instead of a game identifier.
For me the solution here is to wipe out all previous suggestions and start over with a structural solution that exploits the power of a relational database system.
If you do not already have one, add a table to your database named Games where you have one record for each game and then this field in that table uniquely identifies each game. Now we can use a relationship between your two tables to count the number of games.
Games::GameID = Plays::GameID
Then we need a table for each player. Surely you have one?
Plays>-----<Players This is a many to many relationship since one play can involve multiple players and one player can be involved in multiple plays. Since we have, I believe, multiple fields in each play that store a PlayerID--the part I would suggest you eventually change--A multi-key of positions can be used in place of the "Join" table I would recomend you transition to in a future version of your system.
Plays::cPositionMulti-Key = Players::PlayerID
Now define a summary field, sGameCount, as the count of Games::GameID in the Games table.
To see the count of games played by a given player in an entire career, put sGameCount on a layout based on Players. To selectively control what games where played by a given player (such as limiting the count to a specific season), put this same field inside a one row filtered portal where the portal filter controls what related records will be part of the total counted by this summary field.
I appreciate you taking the time to help me optimize my database but I just want an answer to the sample database. If you want to play around with the data it can be found in its entirity at Retrosheet.org.
All I want is a solution to the fairly simple database example I presented. Using the Sum reciprocal method doesn't seem to work and from the examples I've seen it is always based on counting a serial number field just like the Key field in my example.
So is there a way to perform a find on the multikey field and then count the unique occurances of the WantUnique field? This should be very straigtfoward but nothing seems to work because of the multikey. I'm beginning to think this is not possible.
P.S. in my real database or even in this sample, if only two fields are concerned getting a unique count is simple with a self-join.
Without the correct structure and an understanding of it and how it needs to work, any work with your sample database is a waste of time as getting it to work is unlikely to be something that will work for you in the real database.