In my last post I discussed SSAS processing. I showed you some counters that can help identify when processing is taking place, and at what rate. I then gave some suggestions as to how you might be able to optimize your processing strategies in order to improve overall performance on your server. In this post I’m going to talk about unoptimized MDX code as the source of your performance issues.
There are quite a few resources available on how to write MDX, so I won’t be going into that here. The point I want to make is that there are ways you can identify queries running on your SSAS instance that may benefit from optimization, even without actually knowing MDX. In this section I’ll mention a handful of key performance counters that will allow you to identify issues related to query optimization. You can then take this knowledge to the developers of the queries with useful feedback as to how to improve performance.
Specific to MDX, there are a few performance counters that can provide a lot of insight. I’ll actually group them into three general issues. The first group are:
· MDX: Total cells calculated
· MDX: Number of calculation covers
· MDX: Total Sonar subcubes
A high number for any of these metrics while a query is executing suggests the query is using cell by cell calculations instead of a block-oriented, also known as “subspace”, evaluation. So what does that mean? To stick with our SQL example from before, you can relate cell by cell calculations to using cursors in SQL. This is often an inefficient and resource intensive way of processing data. Maybe you’ve heard of the term ”RBAR” in SQL development, which stands for “Row By Agonizing Row”. Well you could call this CBAC, or “Cell By Agonizing Cell”.
The MDX performance whitepaper I mentioned in the first post in this series goes into great detail about this. There have been many enhancements to block-oriented evaluation mode since then and updated details can be found in Performance Improvements for MDX in SQL Server 2008 Analysis Services.
Another counter that can indicate serious performance issues with MDX:
· MDX: Total recomputes
This indicates errors in calculations during query execution. A non-zero value here can indicate issues where unnecessary recalculations are taking place and can even lead to a fall back to cell by cell mode in an effort to eliminate the error.
Jeffrey Wang has a great blog post that walks though some examples of where recomputes can have a dramatic impact on performance.
One more set of counters that I think gives us a good look into MDX performance:
· MDX: Total NON EMPTY unoptimized
· MDX: Total NON EMPTY for calculated members
In a nutshell, SSAS data tends to be sparsely populated data. Take data on sales orders for an example. A typical sales order is not likely to include at least one of every item available for purchase. There are likely to be a lot of empty cells for items not purchased for any given order.
You’ll remember earlier I mentioned the performance impact of cell by cell calculations. When you have sparse data, specifying a NON EMPTY algorithm can also speed up performance, but there is actually more than one code path for this algorithm. A slower code path can actually cause performance degradation. This behavior has been improved in SSAS 2008, but can still occur. The above counters can help you identify when that’s the case. More details can be found in the NON_EMPTY_BEHAVIOR section of the Analysis Services MOLAP Performance Guide for SQL Server 2012 and 2014.
Improving MDX Performance
So to wrap up this edition of the series, what are some primary strategies for optimizing MDX performance? First, proper aggregations and partitions can make a big difference, especially if the bottleneck is in the storage engine. There is tons of information on aggregation strategy that I won’t repeat here, but to reuse our analogy, aggregation strategy is similar to SQL index strategy. There is a sweet spot. Too many aggregations can be just as bad as too few. Aggregations provide pre-calculated data that can improve query performance. However, aggregations must be maintained and too many will increase the resource requirements during processing as well as storage requirements for all this extra data.
If the bottleneck appears to be in the formula engine, watch for cell by cell calculation, and eliminate empty cells and tuples using the recommended strategies in the linked whitepapers mentioned above.
So at this point you might be wondering, aside from the counters we just mentioned, how do we know where the main bottle neck might be, formula engine or storage engine? Where should we focus our performance troubleshooting? I’ll help you further pinpoint those efforts in the next post.