Define a field called "aliases" in your "good" data.
That field, in your example, would look like this:
Smith & Wesson
Smith and Wesson
Now form a relationship between the data you're trying to fix (your "bad" data), and this field.
Records in the "bad" that match any of the values listed in the "good" aleases will be related, so you can look up the desired name from "good".
I'm not sure I fully understand what you're doing, but would it be easier (and more bulletproof) to have a unique ID tied to the name and use the unique ID for an internal join table that ties names to IDs? Then you can simply have a relationship between that join table and the incoming names. Any names that don't exist are flagged to be added. Example:
ID Standard Name Address
1 Smith & Wesson 101 Firearms Way, Someplace, Wherever
1 Smith & Wesson
1 Smith and Wesson
Then, wherever you refer to that vendor, you use ID = 1, and pull everything from the Vendor table.
Apologies if I've misconstrued your need and / or setup.
Yes, and you can have a "multi-line" key relationship:
Two fields (at least)
Standard Name, Possible Matches
Smith and Wesson, Smith and Wesson¶
Smith & Wesson¶
The second field (Possible Matches) is RETURN-DELIMITED VALUES and is indexed. Use it to match (relate to) your "entry field". Then use that relationship to bring in the "standard" value.
Well Hi Beverly! Long time no talk
OK let me explain further...
The data we're getting from the vendors is massive and frequent. Inventories changes, items discontinue and new are added. Sometimes new categories are added that accompany these updated arrays, CSV, XML...however the different vendors send it.
Currently all of that data combines to total 595,890 records with all the vendors data and varying namings. There are separate tables for each vendor's data (not really relevant) but all of the data from those vendor tables are combined in this massive array they create.
To explain why they get an array, or rather, what they do with it:
A product may import with an array that creates 20 records in FM. Those 20 records all have the same unique ID (ItemCode) which is used to create a Technical Specs list like this:
Retail Price: 21.42
Item Class: 03009
Item Group: 03
Consumer Description: Handgun Ammunition .380 Auto 92 Grain Full Metal Jacket
Quantity Case: 20
Bullet Weight: 92 gr.
Item Code: SB380A
Caliber: .380 ACP
Catalog Vendor Name: SELLIER AND BELLOT
Quantity per Box: 50
Item Status: OPEN
Wildcat Eligible: N
Item Description: SBT 380AUTO 92 FMJ 50
Primary Vendor: 52155
Caliber-Gauge: .380 ACP
Bullet Style: FMJ
Model Header: .380 Auto
Master Model Body Copy: Designed to deliver good accuracy and positive function in all handguns with no barrel leading or bullet expansion.
Master Model Header: Pistol Ammunition
That's the way they do it and that's what I have to work with. I use ExecuteSQL to parse out the product details for other fields in another table called Products. That table will receive the Category pulled for the array.
So now I have a Product with a corresponding Category for a given vendor.
There is no way to define a UUID that relates to anything but the single record being imported, and certainly no way these data can be matched between the vendors because there is not a common ID shared between vendors. So I have no base for a join table for lookups or a way to effectively create it because I have no definitions. List doesn't line up S&W, Smith and Wesson, Smith & Wesson as related because the relationship doesn't exist between the vendors. I assumed I would have to create it manually, thus the question. Hopefully there is a solution because I don't see it yet other than manually building the join table or writing a Case.
It occurred to me that the products should have a UPC or manufacturer code in common across all vendors. That may be my match field to build a join table and get part of the way there. Have to dig in the array. BRB
You create your own unique ID code. (I don't think anybody said you could use one the vendor creates.) And yes, you have to create your own join table. And yes, to some degree, it has to be done manually. At least for vendor names you don't already have.
However, it's a simple matter for FileMaker to join the records to the incoming data and omit the ones that already exist. After that, it's up to you to go through the data for the ones that don't exist and assign them to vendors. You can either do that by creating a new join record, or, as Chris and Beverly have suggested, add a line to the multi-key. Different method, same general principle.
But no, there's no way for the computer to read your mind and say, "Gee, they said 'Smith & Wesson', but they really mean 'S&W'." You have to provide the brains. Computers just aren't that smart.
Hey Jesse! you may be able to calculate some "loose matches" to get you close, but a CASE() would be far more complex than the "lookup" table. And yes, this would take time to build, but would be easier to modify with new matches to the 'standard' value. The human factor will be needed.
I have done some 'data cleaning' using find/replace, but I have to have something to find on. You'll need to have a way to know what's valid.
Find "s w" or "smith" or "wesson", for example.
you may get things that you know wouldn't be correct, so it would need further data manipulation.
Keeping an alias dictionary or a programmatic alias translator in sync with vendor catalogs that are always in flux is going to be a time consuming manual task. (Read Full Time Job)
Actually, we just did something very similar for a magazine vendor with vendors who name the magazines differently. On import, if any "new" magazine names are encountered, the user is prompted to either select an existing magazine (in which case the new name is added to that magazine's aliases) or create a new one.
Of course, if there's a SKU or something that can be used for more reliable matches, that would be easier. Absent that, though, it may be work but a) it's a necessary job and b) it can be at least somewhat automated.
Yes. The point is, you can't remove the human from the process.
Looks like I haven't conveyed it adequately (or I have) because nobody has really touched on what I thought I was asking. Sorry.
I'm aware of what I can and can't do with this data, what FileMaker and "computers" can't do, and how a relational database works. I wasn't asking how to do this if I had all the pieces. That is simple and I'm sorry I wasted your time if you thought that was what I was asking.
The question I was trying to ask was...based on this massive dataset, do you see a way to dynamically build the join table and compare it to a table of categories defined as the standard name for the categories...the final process once I get the dataset categories standardised (intermediate step).
So with the 595K record data set could anybody think of a way to produce the join table. I understand there is little go on or work with in the dataset. Having written all this out I see it was kind of a stupid question because it is clearly an impossible feat to automate without a defined join table and can't be produced from all the unreacted data from all the vendors.
I guess I was hoping that somebody smarter than me would see something I'm overlooking and have a creative solution. I knew it was a pipe dream but wanted to throw it out there. I only see a manual solution and manual maintenance.
I have an idea for a partial solution...besides getting an intern
Thanks to all for taking the time to reply.
Thanks again. I can partially automate, and grouping the vendor data by product to output a list of all the category namings has produced the start of my join table. It will be up to me to manually define the "official" category name.
That is the big and small of it exactly. Thanks for your input Chris!
Thanks Beverly yeah a case just sounds miserable...
I was going to sort the full vendor's product list by the product next, to group them and look at just how different the category namings are from each other and get a better idea of what I'm dealing with. I can look at "data cleaning" to expedite the next step. It will come together eventually...hopefully before the new year