Wednesday, March 28, 2012

Increase Aggregation Percentage

Is there a way to force the design aggregation in BIDS to 100%? All of my dimensions have attribute relations defined (as indicated in the designer) yet I can get only around 35% aggregations defined.

My main issue is an Excel pivot table that the user wants to create that has 3 row dimensions. When the Excel pivot table is first built, access is fast, but when the user drills down to bottom level of each dimension, the query takes a long time.

We have a Hyperion Essbase/Analyzer report that shows the same information rather quickly. The difference I can see is that Analyzer has a list bottom function and the cube space in Essbase is very large (I think Essbase builds an aggregation for everything thus the faster access speed).

Aside from increasing the aggregation, I don't know what else to try. I've read the Project Real paper and utilized tips when possible. I'm using the Standard Edition of SQL Server 2005 so my options are limited.

In upcoming Service Pack 2 release of Analysis Services you will see new sample application that allow you to build set of aggregations based on the information logged in the QueryLog. You will browse your cube and then you'd design aggregations that cover the set of queries you'd like to see working faster.

Edward.
--
This posting is provided "AS IS" with no warranties, and confers no rights.

|||

What does aggregations imply under partitions in BIDS? I took some dimensions out of my cube and performed some other changes and now get 99% optimization. Yet, performance have not increased to an acceptable level. Again, these are queries down to the lowest level in Excel.

The Fact table has about 4 million records (3 cubes share this information). Does it make any difference if I created separate SQL fact tables; one for each company. Right now, I'm limiting the records by a query in the data source view. All of my cubes are MOLAP so I thought this wouldn't affect cube performace once the cubes are calculated.

It seems that more aggregations are created when I don't create attribute hierarchies between the levels. Shouldn't performance be the best when attribute relations are defined for all hierarchy levels?

I also tried restricting the amount of data to a seven month period (current year) and still no luck.

|||

You can't preaggregate everything in any database. If you have say 10 dimensionlevels of 10 members each the possible aggregation size of these levels is 10^10 cells. Unless you are sitting on a very large base data table this is gonna translate into an aggregation that IS your basedata (as that is the actual number of nonempty combinations that exists). Thus it's a complete waste to materialize this aggregation and it isn't done. The fewer number of rows in basedata you have and the more dimensions/levels etc. the fewer of all possible aggregations actually make sense. What you are seeing is that the top levels are aggregated and comes back quickly as they contain much fewer rows then base data. When you drill down into the bottom you are leaving the aggregations and walking through the basedata instead - thus the slowness. This isn't due to sloppy implementation but a mathematical certainty true of all preaggregated databases. It would be a great service if someone included an explanation of this in the official documentation as people who aren't familiar with cubes normally assume they are insanely fast for everything and then get very disappointed when they're not.

I haven't used Essbase but have used Excel and is guessing that the speeddifference you are seeing is mainly due to Excel as a client writing absolutely horrendous MDX causing it to go a lot slower then the server can do if you write it yourself. Unfortunately nothing much can be done about this except waiting for Excel 2007 and hoping it's better or switching to another client.

As for the optimization number I am a little curious myself but think it is the percentage of possible queries that can be answered by aggregations. If you take away enough dimensions eventually you will have so few that every combination leads to a sensible aggregation and thus a very high percentage. Still when you do your drilldown you enter aggregations that are very large - it should go faster then before but perhaps not sufficiently fast for your purposes.

|||

Is this a different feature than usage based optimization that is available for partitions? Can you elaborate and clarify the differences?

Thanks

|||

Aggregations themselves are identical and as described above. There is a set of aggregations that "makes sense". However if we put size limits on the combined size of all these we have to make a choice of which to materialize or not.

There's two wizards to do this; one is used at designtime and assumes that all queries are as likely to happen. I assume (as it makes sense but don't really know) this means it sorts by estimated size and then takes as many as can fit starting from the smallest.

The other wizard is the usage based you're talking about. This one looks through the querylogs and prioritizes aggregations that are actually used by queries. This would mean you might drop several unused smaller aggregations to fit a larger aggregation that is actually used.

|||

Ravel, I see what you mean about number of combinations that needs to be aggregated. As mentioned, the report that the user wants contains three dimensions at the lowest level in the row, plus two filter and two column dimensions. On further analysis of the report, it looks like we would be better off writing a query against the SQL database. However, I will lose the budget variance and year to date cube members.

In an effort to increase response time, I've been trying to create smaller cubes. As mentioned, I don't have the Enterprise edition so I cannot use partioning. If I limit the number of records by creating a named query in the data source view is it essentially the same as creating separate tables of the limited data? Meaning, is cube calculation speed the only thing I'm sacrificing by using named queries in the data source view?

Finally, does anyone have any comments about my setup? I have one fact table that looks like:
District, Acct, Scenario (actual, budget, budget update), YearMo, Acct Seg1, Acct Seg2, Acct Seg3, Acct Seg4, Acct Seg5, Acct Seg 6, Amount. All fields in the fact table represents the lowest level of detail in a related dimension table.

My dimension members in SQL looks like:
District, Acct (level 0), Acct (level 1), Acct (level 2), Acct (level 3), Acct (level 4)
District, Acct Seg1 (level 0), Acct Seg (level 1), Acct Seg (level 2)
Repeat for Acct Seg 2 to 6, Time, Scenario

Each of the dimension tables joins to a fact table field. Each dimension record in SQL contains all of the member rollup information. All dimensions have attribute relationships defined. I cannot think of a simpler design, but perhaps the simple design is not the optimal design. However, am I correct to assume that once the cubes (MOLAP) are processed, the Excel queries do not access the source SQL Server tables?

|||

"However, am I correct to assume that once the cubes (MOLAP) are processed, the Excel queries do not access the source SQL Server tables?"

Yes.

"In an effort to increase response time, I've been trying to create smaller cubes. As mentioned, I don't have the Enterprise edition so I cannot use partioning. If I limit the number of records by creating a named query in the data source view is it essentially the same as creating separate tables of the limited data? Meaning, is cube calculation speed the only thing I'm sacrificing by using named queries in the data source view?"

I'm not quite sure I understand you here. But if you use Molap you store all data in the AS database as well. If you've used a named query to limit the amount of rows you will have less data both to build the cube with and to read through at querytime. Thus you should have both faster processing and query results at the expense of not having as much data to analyze as before. I have only used AS with partitions so I can't really answer on how to emulate them if you don't have them.

No comments:

Post a Comment