8 Replies Latest reply on Feb 9, 2017 3:52 AM by Wicktor

    Portal and Duplicates

    Wicktor

      I have been asked to take over a retiring colleague and to solve some problems of a previously developed solution.

      The solution is complex but this particular Table is a big issue for me.

       

      A PRODUCTS table is structured as follows:

      Client_Name

      Date

      Product1_name

      Product1_details1 (list of values)

      Product1_details2 (list of values)

      Product1_details3 (list of values)

      Product1_key (text calculated: Product1 & Details1 & Details2 & Details3)

      Product2_name

      Product2_details1 (list of values)

      Product2_details2 (list of values)

      Product2_details3 (list of values)

      Product2_key (text calculated: Product2 & Details1 & Details2 & Details3)

       

      The Table has some 300.000 records and a significant number of duplicates, i.e. different Clients may have purchased many Product1 and/or Product2 with various values in each of Details fields.

       

      Of course my first observation was that the Table is not well structured and the funny aspect is that Product1 and Product2 might also have the same name and same details.

       

      I have been asked to produce a portal which shows only unique values of Products and related Details.

      I have already tagged duplicates adding a Serial Number as PrimaryKey, two new self join relationship.

      PRODUCTS <-> PRODUCTS_key1

      PRODUCTS <-> PRODUCTS_key2

      and a new field If (PrimaryKey = PRODUCTS_key1::PrimaryKey or PrimaryKey = PRODUCTS_key2::PrimaryKey ; ""; "Duplicate")

      I think I am “almost there” but I miss how to show a unique list of Product/Detail1/Detail2/Detail3

       

      Before suggesting to totally destroy this Table and create a new one with correct structure I wonder if there is a faster solution to the problem.

       

      Thanks for any help,

      Victor

        • 1. Re: Portal and Duplicates
          Mike_Mitchell

          Wicktor wrote:

           

          Before suggesting to totally destroy this Table and create a new one with correct structure I wonder if there is a faster solution to the problem.

           

          When choosing between "fast" and "right", I usually try to choose "right", because it saves trouble down the road (such as the trouble in which you now find yourself).

           

          However, if all you're trying to do is identify the duplicates that have been entered, you can do that by exporting to two external files (one with product 1, one with product 2), combining them (via another import), and then doing a self-join as you've suggested. But that will be a static solution; as people add data to the system, you'll have to repeat the process (this time exporting to the single, combined table from product 1 and 2 in two steps).

          1 of 1 people found this helpful
          • 2. Re: Portal and Duplicates
            Wicktor

            Hi Mike_Mitchell

            yes, I agree that "right" is better and I don't want to come back to similar issue in the future.

            I will export into two separate files Product1 and Product2, I will delete from the Table all fields belonging to Product2, then rename all fields belonging to Product1 into "Product" and then re-import into a single Table.

            One more issue I found: the price in each record is a single field with price of both products together

            That will be a nightmare !

            • 3. Re: Portal and Duplicates
              coherentkris

              "Of course my first observation was that the Table is not well structured "

              you hit the nail on the head here.

              Anytime you see fields named entity1, entity2, entityN then it is highly likley that the entity needs to be in a table of its own.

               

              Welcome to the world of inheriting other peoples work.

              • 4. Re: Portal and Duplicates
                siplus

                The first problem I see is that ProductX_detailsY fields might contain the same values but in different order. The ¶'s will mess up the index, too, therefore

                 

                Product1_key (text calculated: Product1 & Details1 & Details2 & Details3) is a bad idea IMHO.

                 

                With the free BaseElements plugin installed, you can have a better key, like the one in the attached example.

                 

                Sorting on it and exporting summarized will give you all the combos of product and details ever chosen by a client.

                1 of 1 people found this helpful
                • 5. Re: Portal and Duplicates
                  Wicktor

                  Many thanks to everyone.

                   

                  I tried to export a sample of 200 records into 2 different files, made a clone of the solution, then re-imported and fixed some calculations.

                  Seems to work fine, iIt will take some time but, well, I can manage that.

                  The real issue is the "Price": the actual Products Table has a single Price field for each record which might be for either Product1, Product2 or both together (and evil idea !!!) and splitting the price will be a nightmare.

                   

                  But, even worst, the biggest issue is that the Invoice Table pulls the price of the related Product by Date

                  Combining Product1 and Products2 into a single Table, if a client purchases two products, he has two records in the same date and the Invoice pulls the Price only the first created record for that date.

                  So I will need to remake also that...too tired to understand how...

                  • 7. Re: Portal and Duplicates
                    philmodjunk

                    the biggest issue is that the Invoice Table pulls the price of the related Product by Date

                    Combining Product1 and Products2 into a single Table, if a client purchases two products, he has two records in the same date and the Invoice pulls the Price only the first created record for that date.

                     

                    The pairing of two products in the same record is unusual. It implies some sort of relationship between the two that you may need to respect even as you redesign your database.

                     

                    But you can fix a lot with the following set up:

                     

                    Invoice----<LineItems>-----ProductPricing   ----< means "one to many"

                     

                    Invoice::__pkInvoiceID = LineItems::_fkInvoiceID

                     

                    ProductPricing::__pkProductID = LineITems::_fkProductID AND

                    ProductPricing::EffectiveDate <= LineItems::OrderDate

                     

                    Sort the second relationship by EffectiveDate in Ascending order to make the most recent product price with the given product ID and an effective date on or before the order date the first related record and thus the source of pricing data looked up into LineItems.

                     

                    (LineItems would be your new table to replace the table you describe in your original post.)

                    1 of 1 people found this helpful
                    • 8. Re: Portal and Duplicates
                      Wicktor

                      philmodjunk

                      thank you for the hint I will work on that