AnsweredAssumed Answered

cross tab performance suggestions - virtual list, denormalize or ??

Question asked by gethappy@happypc.com on Oct 7, 2016
Latest reply on Oct 8, 2016 by sccardais

I have an existing database with a cross tab report (already built) that has speed issues.  It was built years ago and works reliably well, but I want to see if we can improve on the speed.

 

First a little background ...

 

The cross tab report is basically a glorified P&L statement.  Each row is a product, and each column is a date range (January, February, March, all years prior, etc).  At the bottom of the report are summary fields to total the columns.  The report is run from a PRODUCT table and contains related financial data from one table called TRANSACTIONS, which contains millions (literally) of records.  We archive the data every year, but we are adding roughly 500,000 transactions every month, so the tables are fairly large and can't be reduced.

 

In the current version we use a ton of unstored calculations and relationships to create the report.  An example, to show one product's MARCH totals ... would be a simple unstored calculation using a relationship showing all transactions for March with the accountType.  This calculation is simply  "sum (relatedTransactions)".

 

The report is REALLY slow (obviously since it uses unstored calculations) ... sometimes taking 3-30 minutes to load depending on the number of products being viewed (usually only 10 - 100 products).  I have since learned the amazing powers of Virtual List and would've built the report using VL if I knew of the "trick" back when the report was created, but since it is already built ... i'm not sure recreating the report using a VL technique will improve the speed significantly.  Obviously it cuts down massively on the schema required, but not sure about speed improvements.  And .. if there are speed improvements would they be substantially or only slightly better. My hope is that the report would load in seconds instead of minutes.  Maybe at most 30 seconds for a particularly large subset.

 

I am very seriously considering denormalizing the data, which I know would make the report lightning fast, but denormalized financials create a huge nightmare especially when deleting existing transactions.  I have denormalized similar solutions but have been frustrated in the past when a user wants to delete a set of data (maybe an invoice that affects 300 products), and I force the user to wait for the delete to occur while the denormalized data gets updated to account for the deleted transactions.  It's also frustrating when there are transactions still being updated in the background and a user has to wait for a layout to load since it relies on denormalized data.  I've also set server side scripts to run at night as a safety measure to keep the denormalized data "in sync", but I'm not a fan of these processes running ~ shakes my confidence knowing they should be there 'just in case'.

 

My gut is that denormalizing is the only solution to get the speed that I want, but at the expense of making users wait when they delete or update data. Hoping for some of you to weigh in, especially when dealing with large data sets.

 

If denormalizing is the answer here, I was thinking I would just need three number fields added to each transaction record ...

1) total revenue for the month for this product

2) total revenue for all time (past and future)

3) total revenue for all time before this transaction and including this transaction

Using these, I think the report would be quit quick to load, most likely via a virtual list technique grabbing the above three fields for the last record for each month or column.

 

It's a very simple change, except when a user wants to delete data mid-month.  a simple example ... you have 100 transactions

in the month and a user wants to delete two transactions that occurred in the middle of the month .... I assume we then need to re-update all 98 other transactions monthly totals.  Then to make matters worse, we would also need to update all the transactions that are in the same year to account for these 2 deleted transactions.  I'm sure I'm being dense here, but this is why I am shying away from denormalizing in this case.  it sounds like an nightmare.

 

Thank you for any thoughts you can provide.

Outcomes