Hi, this has been bothering me a bit and I am struggling to see a clear wining solution. I have a dataset of some 9,000 records which I've OCR'd and flattened from about 40 pages of scanned pdf cross tab layout tables (2 dimensional data). I am trying to come up with the best normalisation strategy. The data will be related to 3 different tables. The first 2 relationships are straight forward primary/foreign key matches. The third is where things get a bit tricky.
The original data (and my flattened version of it) was structured in ranges in both the x and y axis. The relationship between the parent and this table will be based on matching ForeignKey1 AND ForeignKey2a AND a single number in the parent falling in the range between X_from and X_to AND another single number in the parent falling between Y_from and Y_to.
In my flattened table I currently have the following fields (simplified names for discussion):
I can easily make the above relationships the way the data is structured, however...... There are only 141 unique ranges in X_from - X_to, and only 50 unique ranges in Y_from - Y_to. These are repeated throughout the 9,000 odd records. Data normalisation rules tell me the ranges should be split out into separate tables but herein lies the problem. They are not all consecutive and there are many overlapping ranges. They are still unique because there are no overlapping ranges within records with the same values in the two ForiegnKey fields. As an example, the following ranges exist in the X axis:
So a single value of 400 in the parent record is in all the above ranges. Any suggestions on a normalisation strategy under these circumstances? Or just keep the duplication as it is and move on?
P.S. I cannot just change the ranges to be linear