AnsweredAssumed Answered

Normalising overlaping ranges

Question asked by onefish on Apr 15, 2018
Latest reply on Apr 16, 2018 by onefish

Hi, this has been bothering me a bit and I am struggling to see a clear wining solution. I have a dataset of some 9,000 records which I've OCR'd and flattened from about 40 pages of scanned pdf cross tab layout tables (2 dimensional data). I am trying to come up with the best normalisation strategy. The data will be related to 3 different tables. The first 2 relationships are straight forward primary/foreign key matches. The third is where things get a bit tricky.

 

The original data (and my flattened version of it) was structured in ranges in both the x and y axis. The relationship between the parent and this table will be based on matching ForeignKey1 AND ForeignKey2a AND a single number in the parent falling in the range between X_from and X_to AND another single number in the parent falling between Y_from and Y_to.

 

In my flattened table I currently have the following fields (simplified names for discussion):

ID

ForiegnKey1

ForiegnKey2

X_from

X_to

Y_from

Y_to

Data

 

I can easily make the above relationships the way the data is structured, however...... There are only 141 unique ranges in X_from - X_to, and only 50 unique ranges in Y_from - Y_to. These are repeated throughout the 9,000 odd records. Data normalisation rules tell me the ranges should be split out into separate tables but herein lies the problem. They are not all consecutive and there are many overlapping ranges. They are still unique because there are no overlapping ranges within records with the same values in the two ForiegnKey fields. As an example, the following ranges exist in the X axis:

 

1450
1595
1600
1610
1630

 

So a single value of 400 in the parent record is in all the above ranges. Any suggestions on a normalisation strategy under these circumstances? Or just keep the duplication as it is and move on?

 

Cheers!

 

P.S. I cannot just change the ranges to be linear

Outcomes