1 2 Previous Next 15 Replies Latest reply on Jun 17, 2015 10:48 AM by disabled_jackrodgers

    Take data in one table and standardise the naming in a field using a defined list in another table

    Jesse_Wright

      To explain, we get products from many vendors, each has their own way to define categories. I need a lookup table to convert them all into a single table standardising the categories. Some vendors have just a small change in spelling i.e S&W vs Smith and Wesson or Smith & Wesson. So if Smith & Wesson is the name defined as the standard, then S&W and Smith and Wesson will be converted to the standard name based on a second table.

       

      I could write a case calc instead of a lookup table (since I don't think that will work), but that would be massive and near impossible to keep current as they will add new Categories and I would not know about it. Import would add a new records after updating the dataset, but the case wouldn't have a statement to accommodate the new Category.

       

      My thought was to run the import data through a lookup table to copy over the standardised name, then I could compare the new list to the lookup table and add any new categories or flag them so I can periodically check for new and define the standard name.

       

      Issue is that I can't think of a way to dynamically define the different name spellings so that the lookup table has the criteria needed to match it to a standard name. Which brings me back to a Case calc.

       

      That updated and standardised category list will be pushed to update an e-commerce website, before the updated and standardised product list is updated. This way I don't have to reset. (Sorry, obscure and not really related but I wanted to kind of explain the end goal to some degree.)

       

      Thanks for letting me pick your brains.

      Best!

      Jesse


        • 1. Re: Take data in one table and standardise the naming in a field using a defined list in another table
          Extensitech

          Define a field called "aliases" in your "good" data.

           

          That field, in your example, would look like this:

           

          Smith & Wesson

          S&W

          Smith and Wesson

          Now form a relationship between the data you're trying to fix (your "bad" data), and this field.

           

          Records in the "bad" that match any of the values listed in the "good" aleases will be related, so you can look up the desired name from "good".

           

          HTH

           

          Chris Cain

          Extensitech

          • 2. Re: Take data in one table and standardise the naming in a field using a defined list in another table
            Mike_Mitchell

            I'm not sure I fully understand what you're doing, but would it be easier (and more bulletproof) to have a unique ID tied to the name and use the unique ID for an internal join table that ties names to IDs? Then you can simply have a relationship between that join table and the incoming names. Any names that don't exist are flagged to be added. Example:

             

            Vendor table

            ID                    Standard Name                         Address

            1                      Smith & Wesson                      101 Firearms Way, Someplace, Wherever

             

            Join table

            ID                    Name

            1                     Smith & Wesson

            1                     S&W

            1                     Smith and Wesson

             

            Then, wherever you refer to that vendor, you use ID = 1, and pull everything from the Vendor table.

             

            Apologies if I've misconstrued your need and / or setup.

             

            Mike

            • 3. Re: Take data in one table and standardise the naming in a field using a defined list in another table
              beverly

              Yes, and you can have a "multi-line" key relationship:

              Two fields (at least)

               

              Standard Name, Possible Matches

               

              Smith and Wesson, Smith and Wesson¶

                          Smith & Wesson¶

                          S&W

                          ...

               

              The second field (Possible Matches) is RETURN-DELIMITED VALUES and is indexed. Use it to match (relate to) your "entry field". Then use that relationship to bring in the "standard" value.

               

              beverly

              • 4. Re: Take data in one table and standardise the naming in a field using a defined list in another table
                Jesse_Wright

                Well Hi Beverly! Long time no talk

                 

                OK let me explain further...

                 

                The data we're getting from the vendors is massive and frequent.  Inventories changes, items discontinue and new are added. Sometimes new categories are added that accompany these updated arrays, CSV, XML...however the different vendors send it.

                 

                Currently all of that data combines to total 595,890 records with all the vendors data and varying namings. There are separate tables for each vendor's data (not really relevant) but all of the data from those vendor tables are combined in this massive array they create.

                 

                To explain why they get an array, or rather, what they do with it:


                A product may import with an array that creates 20 records in FM. Those 20 records all have the same unique ID (ItemCode)  which is used to create a Technical Specs list like this:

                 

                Retail Price: 21.42

                Item Class: 03009

                Item Group: 03

                Consumer Description: Handgun Ammunition .380 Auto 92 Grain Full Metal Jacket

                Quantity Case: 20

                Bullet Weight: 92 gr.

                Item Code: SB380A

                Caliber: .380 ACP

                Catalog Vendor Name: SELLIER AND BELLOT

                Quantity per Box: 50

                Item Status: OPEN

                Wildcat Eligible: N

                Item Description: SBT 380AUTO 92 FMJ 50

                Primary Vendor: 52155

                Caliber-Gauge: .380 ACP

                InventoryTyp: HRD

                Bullet Style: FMJ

                ProductPageNumber: n/a

                Model Header: .380 Auto

                Master Model Body Copy: Designed to deliver good accuracy and positive function in all handguns with no barrel leading or bullet expansion.

                Master Model Header: Pistol Ammunition

                 

                That's the way they do it and that's what I have to work with. I use ExecuteSQL to parse out the product details for other fields in another table called Products. That table will receive the Category pulled for the array.

                 

                So now I have a Product with a corresponding Category for a given vendor.


                There is no way to define a UUID that relates to anything but the single record being imported, and certainly no way these data can be matched between the vendors because there is not a common ID shared between vendors. So I have no base for a join table for lookups or a way to effectively create it because I have no definitions.  List doesn't line up S&W, Smith and Wesson, Smith & Wesson as related because the relationship doesn't exist between the vendors. I assumed I would have to create it manually, thus the question. Hopefully there is a solution because I don't see it yet other than manually building the join table or writing a Case.

                 

                Thanks everyone!

                • 5. Re: Take data in one table and standardise the naming in a field using a defined list in another table
                  Jesse_Wright

                  It occurred to me that the products should have a UPC or manufacturer code in common across all vendors. That may be my match field to build a join table and get part of the way there. Have to dig in the array. BRB

                  • 6. Re: Take data in one table and standardise the naming in a field using a defined list in another table
                    Mike_Mitchell

                    You create your own unique ID code. (I don't think anybody said you could use one the vendor creates.) And yes, you have to create your own join table. And yes, to some degree, it has to be done manually. At least for vendor names you don't already have.

                     

                    However, it's a simple matter for FileMaker to join the records to the incoming data and omit the ones that already exist. After that, it's up to you to go through the data for the ones that don't exist and assign them to vendors. You can either do that by creating a new join record, or, as Chris and Beverly have suggested, add a line to the multi-key. Different method, same general principle.

                     

                    But no, there's no way for the computer to read your mind and say, "Gee, they said 'Smith & Wesson', but they really mean 'S&W'." You have to provide the brains. Computers just aren't that smart.  

                    • 7. Re: Take data in one table and standardise the naming in a field using a defined list in another table
                      beverly

                      Hey Jesse! you may be able to calculate some "loose matches" to get you close, but a CASE() would be far more complex than the "lookup" table. And yes, this would take time to build, but would be easier to modify with new matches to the 'standard' value. The human factor will be needed.

                       

                      I have done some 'data cleaning' using find/replace, but I have to have something to find on. You'll need to have a way to know what's valid.

                           Find "s w" or "smith" or "wesson", for example.

                      you may get things that you know wouldn't be correct, so it would need further data manipulation.

                       

                      beverly

                      • 8. Re: Take data in one table and standardise the naming in a field using a defined list in another table
                        coherentkris

                        Keeping an alias dictionary or a programmatic alias translator in sync with vendor catalogs that are always in flux is going to be a time consuming manual task. (Read Full Time Job)

                        • 9. Re: Take data in one table and standardise the naming in a field using a defined list in another table
                          Extensitech

                          Actually, we just did something very similar for a magazine vendor with vendors who name the magazines differently. On import, if any "new" magazine names are encountered, the user is prompted to either select an existing magazine (in which case the new name is added to that magazine's aliases) or create a new one.

                           

                          Of course, if there's a SKU or something that can be used for more reliable matches, that would be easier. Absent that, though, it may be work but a) it's a necessary job and b) it can be at least somewhat automated.

                           

                          Chris Cain

                          Extensitech

                          • 10. Re: Take data in one table and standardise the naming in a field using a defined list in another table
                            Mike_Mitchell

                            Yes. The point is, you can't remove the human from the process.

                            • 11. Re: Take data in one table and standardise the naming in a field using a defined list in another table
                              Jesse_Wright

                              Looks like I haven't conveyed it adequately (or I have) because nobody has really touched on what I thought I was asking. Sorry.

                               

                              I'm aware of what I can and can't do with this data, what FileMaker and "computers" can't do, and how a relational database works. I wasn't asking how to do this if I had all the pieces. That is simple and I'm sorry I wasted your time if you thought that was what I was asking.

                               

                              The question I was trying to ask was...based on this massive dataset, do you see a way to dynamically build the join table and compare it to a table of categories defined as the standard name for the categories...the final process once I get the dataset categories standardised (intermediate step).

                               

                              So with the 595K record data set could anybody think of a way to produce the join table. I understand there is little go on or work with in the dataset. Having written all this out I see it was kind of a stupid question because it is clearly an impossible feat to automate without a defined join table and can't be produced from all the unreacted data from all the vendors.

                               

                              I guess I was hoping that somebody smarter than me would see something I'm overlooking and have a creative solution. I knew it was a pipe dream but wanted to throw it out there. I only see a manual solution and manual maintenance.

                               

                              I have an idea for a partial solution...besides getting an intern

                               

                              Thanks to all for taking the time to reply.

                              Best!

                              Jesse

                              • 12. Re: Take data in one table and standardise the naming in a field using a defined list in another table
                                Jesse_Wright

                                Agreed.

                                 

                                Thanks again. I can partially automate, and grouping the vendor data by product to output a list of all the category namings has produced the start of my join table. It will be up to me to manually define the "official" category name.

                                • 13. Re: Take data in one table and standardise the naming in a field using a defined list in another table
                                  Jesse_Wright

                                  That is the big and small of it exactly. Thanks for your input Chris!

                                  • 14. Re: Take data in one table and standardise the naming in a field using a defined list in another table
                                    Jesse_Wright

                                    Thanks Beverly yeah a case just sounds miserable...

                                    I was going to sort the full vendor's product list by the product next, to group them and look at just how different the category namings are from each other and get a better idea of what I'm dealing with. I can look at "data cleaning" to expedite the next step. It will come together eventually...hopefully before the new year

                                    1 2 Previous Next