I have a database with roughly 15 million records, and I'm trying to find a way to speed up reporting related to summarizing large blocks of data. This is somewhat of a general question as to how best to deal with calculating large data sets.
It is more or less an accounting system with a running list of all transactions for all products for this company.
When there is a sale a transaction record is created showing the sale. When a return of a product occurs, a transaction is created with a minus value.
No records are EVER deleted. If a user needs to delete a transaction, the system creates a reverse contra-transaction in a sense reversing out the original transaction. So in this case, you would have the original transaction with a sale of $10.00, and a second transaction with a sale of $-10.00. I'm overly simplifying this, but you get the gist.
I'm trying to find a quick way for reporting to get a grand total of sales over time. Currently we are using unstored transactions, but a typical monthly cross tabular reports (not surprisingly) takes 10+ minutes to load (when the report shows sales by each month in the year for 20 - 100 products).
in brainstorming, I thought ... maybe if I store the total for the month for each product in that transaction's record (in a number field), I could use that to quickly grab totals for a month. So when a transaction is created it not only stores the sale details, but it also stores the total sales to date for that one product for that one month (a running total). so ... let's say there are three transactions for this product for the month each $10.00. The first transaction would show a monthly total of $10.00 for the month, the second transaction would show a monthly total of $20.00 and third (and final) transaction would show a monthly total of $30.00. By the way, most of these products have WAY more than 3 transactions a month, so I'm overly simplifying here.
For reporting ... My thinking was that by using this method, I could then use a relationship to find the last record (using a relationship that shows all for the month with a reverse sort in the TO parameter to bring the last value to the top) for each month and grab that one stored "monthly running total" value to know the total sales for the month. So instead of the system having to sum the three records (again ... normally not three records, but hundreds a month), I just grab the one "monthly total" record in the last transaction for the month.
My first question is ... is there a "faster" way to do this, or is this a good method. We are basically doing the work at the time of data entry vs. waiting to do the work at the time of reporting.
I don't know if this is possible, but I was also hoping to make the reporting somewhat script-less, so when a user opens a report they don't have to wait for data to get gathered, but the data just pulls in. I think using the above relationship with some global month, global year fields ... we can make this happen without scripts.
Now along the same lines, we run product reports to show "all time" totals. So instead of monthly reports, we want to know how much a particular product made over all time. The report usually contains 20 - 100 products and shows the grand total sales for each of them. This report is slower since we are now looking at a much larger data set. Originally I was thinking I would just get all the "last" values for each month and sum those. So this would be still relatively fast since we are summing up 12 monthly values x 10 - 20 years of data instead of hundreds of thousands of records for each product. But I can't figure out a way to do this summarization without using a looping script since I don't think I can setup a relationship(s) to get the last value of each month over all time??
One option (could this be faster???) would be to not only store the monthly value, but also store in a new number field the total over all time. This would make the reporting really fast, but might make entering a transaction slow. I'm guessing it would only add 5-15 seconds to each transaction, but sometimes they import sales data with over 10,000 products at a time ... so 5-15 seconds x 10,000 would really slow down these imports.
is there a better way to handle this?