Showing posts with label designer. Show all posts
Showing posts with label designer. Show all posts

Wednesday, March 28, 2012

Increase Aggregation Percentage

Is there a way to force the design aggregation in BIDS to 100%? All of my dimensions have attribute relations defined (as indicated in the designer) yet I can get only around 35% aggregations defined.

My main issue is an Excel pivot table that the user wants to create that has 3 row dimensions. When the Excel pivot table is first built, access is fast, but when the user drills down to bottom level of each dimension, the query takes a long time.

We have a Hyperion Essbase/Analyzer report that shows the same information rather quickly. The difference I can see is that Analyzer has a list bottom function and the cube space in Essbase is very large (I think Essbase builds an aggregation for everything thus the faster access speed).

Aside from increasing the aggregation, I don't know what else to try. I've read the Project Real paper and utilized tips when possible. I'm using the Standard Edition of SQL Server 2005 so my options are limited.

In upcoming Service Pack 2 release of Analysis Services you will see new sample application that allow you to build set of aggregations based on the information logged in the QueryLog. You will browse your cube and then you'd design aggregations that cover the set of queries you'd like to see working faster.

Edward.
--
This posting is provided "AS IS" with no warranties, and confers no rights.

|||

What does aggregations imply under partitions in BIDS? I took some dimensions out of my cube and performed some other changes and now get 99% optimization. Yet, performance have not increased to an acceptable level. Again, these are queries down to the lowest level in Excel.

The Fact table has about 4 million records (3 cubes share this information). Does it make any difference if I created separate SQL fact tables; one for each company. Right now, I'm limiting the records by a query in the data source view. All of my cubes are MOLAP so I thought this wouldn't affect cube performace once the cubes are calculated.

It seems that more aggregations are created when I don't create attribute hierarchies between the levels. Shouldn't performance be the best when attribute relations are defined for all hierarchy levels?

I also tried restricting the amount of data to a seven month period (current year) and still no luck.

|||

You can't preaggregate everything in any database. If you have say 10 dimensionlevels of 10 members each the possible aggregation size of these levels is 10^10 cells. Unless you are sitting on a very large base data table this is gonna translate into an aggregation that IS your basedata (as that is the actual number of nonempty combinations that exists). Thus it's a complete waste to materialize this aggregation and it isn't done. The fewer number of rows in basedata you have and the more dimensions/levels etc. the fewer of all possible aggregations actually make sense. What you are seeing is that the top levels are aggregated and comes back quickly as they contain much fewer rows then base data. When you drill down into the bottom you are leaving the aggregations and walking through the basedata instead - thus the slowness. This isn't due to sloppy implementation but a mathematical certainty true of all preaggregated databases. It would be a great service if someone included an explanation of this in the official documentation as people who aren't familiar with cubes normally assume they are insanely fast for everything and then get very disappointed when they're not.

I haven't used Essbase but have used Excel and is guessing that the speeddifference you are seeing is mainly due to Excel as a client writing absolutely horrendous MDX causing it to go a lot slower then the server can do if you write it yourself. Unfortunately nothing much can be done about this except waiting for Excel 2007 and hoping it's better or switching to another client.

As for the optimization number I am a little curious myself but think it is the percentage of possible queries that can be answered by aggregations. If you take away enough dimensions eventually you will have so few that every combination leads to a sensible aggregation and thus a very high percentage. Still when you do your drilldown you enter aggregations that are very large - it should go faster then before but perhaps not sufficiently fast for your purposes.

|||

Is this a different feature than usage based optimization that is available for partitions? Can you elaborate and clarify the differences?

Thanks

|||

Aggregations themselves are identical and as described above. There is a set of aggregations that "makes sense". However if we put size limits on the combined size of all these we have to make a choice of which to materialize or not.

There's two wizards to do this; one is used at designtime and assumes that all queries are as likely to happen. I assume (as it makes sense but don't really know) this means it sorts by estimated size and then takes as many as can fit starting from the smallest.

The other wizard is the usage based you're talking about. This one looks through the querylogs and prioritizes aggregations that are actually used by queries. This would mean you might drop several unused smaller aggregations to fit a larger aggregation that is actually used.

|||

Ravel, I see what you mean about number of combinations that needs to be aggregated. As mentioned, the report that the user wants contains three dimensions at the lowest level in the row, plus two filter and two column dimensions. On further analysis of the report, it looks like we would be better off writing a query against the SQL database. However, I will lose the budget variance and year to date cube members.

In an effort to increase response time, I've been trying to create smaller cubes. As mentioned, I don't have the Enterprise edition so I cannot use partioning. If I limit the number of records by creating a named query in the data source view is it essentially the same as creating separate tables of the limited data? Meaning, is cube calculation speed the only thing I'm sacrificing by using named queries in the data source view?

Finally, does anyone have any comments about my setup? I have one fact table that looks like:
District, Acct, Scenario (actual, budget, budget update), YearMo, Acct Seg1, Acct Seg2, Acct Seg3, Acct Seg4, Acct Seg5, Acct Seg 6, Amount. All fields in the fact table represents the lowest level of detail in a related dimension table.

My dimension members in SQL looks like:
District, Acct (level 0), Acct (level 1), Acct (level 2), Acct (level 3), Acct (level 4)
District, Acct Seg1 (level 0), Acct Seg (level 1), Acct Seg (level 2)
Repeat for Acct Seg 2 to 6, Time, Scenario

Each of the dimension tables joins to a fact table field. Each dimension record in SQL contains all of the member rollup information. All dimensions have attribute relationships defined. I cannot think of a simpler design, but perhaps the simple design is not the optimal design. However, am I correct to assume that once the cubes (MOLAP) are processed, the Excel queries do not access the source SQL Server tables?

|||

"However, am I correct to assume that once the cubes (MOLAP) are processed, the Excel queries do not access the source SQL Server tables?"

Yes.

"In an effort to increase response time, I've been trying to create smaller cubes. As mentioned, I don't have the Enterprise edition so I cannot use partioning. If I limit the number of records by creating a named query in the data source view is it essentially the same as creating separate tables of the limited data? Meaning, is cube calculation speed the only thing I'm sacrificing by using named queries in the data source view?"

I'm not quite sure I understand you here. But if you use Molap you store all data in the AS database as well. If you've used a named query to limit the amount of rows you will have less data both to build the cube with and to read through at querytime. Thus you should have both faster processing and query results at the expense of not having as much data to analyze as before. I have only used AS with partitions so I can't really answer on how to emulate them if you don't have them.

Increase Aggregation Percentage

Is there a way to force the design aggregation in BIDS to 100%? All of my dimensions have attribute relations defined (as indicated in the designer) yet I can get only around 35% aggregations defined.

My main issue is an Excel pivot table that the user wants to create that has 3 row dimensions. When the Excel pivot table is first built, access is fast, but when the user drills down to bottom level of each dimension, the query takes a long time.

We have a Hyperion Essbase/Analyzer report that shows the same information rather quickly. The difference I can see is that Analyzer has a list bottom function and the cube space in Essbase is very large (I think Essbase builds an aggregation for everything thus the faster access speed).

Aside from increasing the aggregation, I don't know what else to try. I've read the Project Real paper and utilized tips when possible. I'm using the Standard Edition of SQL Server 2005 so my options are limited.

In upcoming Service Pack 2 release of Analysis Services you will see new sample application that allow you to build set of aggregations based on the information logged in the QueryLog. You will browse your cube and then you'd design aggregations that cover the set of queries you'd like to see working faster.

Edward.
--
This posting is provided "AS IS" with no warranties, and confers no rights.

|||

What does aggregations imply under partitions in BIDS? I took some dimensions out of my cube and performed some other changes and now get 99% optimization. Yet, performance have not increased to an acceptable level. Again, these are queries down to the lowest level in Excel.

The Fact table has about 4 million records (3 cubes share this information). Does it make any difference if I created separate SQL fact tables; one for each company. Right now, I'm limiting the records by a query in the data source view. All of my cubes are MOLAP so I thought this wouldn't affect cube performace once the cubes are calculated.

It seems that more aggregations are created when I don't create attribute hierarchies between the levels. Shouldn't performance be the best when attribute relations are defined for all hierarchy levels?

I also tried restricting the amount of data to a seven month period (current year) and still no luck.

|||

You can't preaggregate everything in any database. If you have say 10 dimensionlevels of 10 members each the possible aggregation size of these levels is 10^10 cells. Unless you are sitting on a very large base data table this is gonna translate into an aggregation that IS your basedata (as that is the actual number of nonempty combinations that exists). Thus it's a complete waste to materialize this aggregation and it isn't done. The fewer number of rows in basedata you have and the more dimensions/levels etc. the fewer of all possible aggregations actually make sense. What you are seeing is that the top levels are aggregated and comes back quickly as they contain much fewer rows then base data. When you drill down into the bottom you are leaving the aggregations and walking through the basedata instead - thus the slowness. This isn't due to sloppy implementation but a mathematical certainty true of all preaggregated databases. It would be a great service if someone included an explanation of this in the official documentation as people who aren't familiar with cubes normally assume they are insanely fast for everything and then get very disappointed when they're not.

I haven't used Essbase but have used Excel and is guessing that the speeddifference you are seeing is mainly due to Excel as a client writing absolutely horrendous MDX causing it to go a lot slower then the server can do if you write it yourself. Unfortunately nothing much can be done about this except waiting for Excel 2007 and hoping it's better or switching to another client.

As for the optimization number I am a little curious myself but think it is the percentage of possible queries that can be answered by aggregations. If you take away enough dimensions eventually you will have so few that every combination leads to a sensible aggregation and thus a very high percentage. Still when you do your drilldown you enter aggregations that are very large - it should go faster then before but perhaps not sufficiently fast for your purposes.

|||

Is this a different feature than usage based optimization that is available for partitions? Can you elaborate and clarify the differences?

Thanks

|||

Aggregations themselves are identical and as described above. There is a set of aggregations that "makes sense". However if we put size limits on the combined size of all these we have to make a choice of which to materialize or not.

There's two wizards to do this; one is used at designtime and assumes that all queries are as likely to happen. I assume (as it makes sense but don't really know) this means it sorts by estimated size and then takes as many as can fit starting from the smallest.

The other wizard is the usage based you're talking about. This one looks through the querylogs and prioritizes aggregations that are actually used by queries. This would mean you might drop several unused smaller aggregations to fit a larger aggregation that is actually used.

|||

Ravel, I see what you mean about number of combinations that needs to be aggregated. As mentioned, the report that the user wants contains three dimensions at the lowest level in the row, plus two filter and two column dimensions. On further analysis of the report, it looks like we would be better off writing a query against the SQL database. However, I will lose the budget variance and year to date cube members.

In an effort to increase response time, I've been trying to create smaller cubes. As mentioned, I don't have the Enterprise edition so I cannot use partioning. If I limit the number of records by creating a named query in the data source view is it essentially the same as creating separate tables of the limited data? Meaning, is cube calculation speed the only thing I'm sacrificing by using named queries in the data source view?

Finally, does anyone have any comments about my setup? I have one fact table that looks like:
District, Acct, Scenario (actual, budget, budget update), YearMo, Acct Seg1, Acct Seg2, Acct Seg3, Acct Seg4, Acct Seg5, Acct Seg 6, Amount. All fields in the fact table represents the lowest level of detail in a related dimension table.

My dimension members in SQL looks like:
District, Acct (level 0), Acct (level 1), Acct (level 2), Acct (level 3), Acct (level 4)
District, Acct Seg1 (level 0), Acct Seg (level 1), Acct Seg (level 2)
Repeat for Acct Seg 2 to 6, Time, Scenario

Each of the dimension tables joins to a fact table field. Each dimension record in SQL contains all of the member rollup information. All dimensions have attribute relationships defined. I cannot think of a simpler design, but perhaps the simple design is not the optimal design. However, am I correct to assume that once the cubes (MOLAP) are processed, the Excel queries do not access the source SQL Server tables?

|||

"However, am I correct to assume that once the cubes (MOLAP) are processed, the Excel queries do not access the source SQL Server tables?"

Yes.

"In an effort to increase response time, I've been trying to create smaller cubes. As mentioned, I don't have the Enterprise edition so I cannot use partioning. If I limit the number of records by creating a named query in the data source view is it essentially the same as creating separate tables of the limited data? Meaning, is cube calculation speed the only thing I'm sacrificing by using named queries in the data source view?"

I'm not quite sure I understand you here. But if you use Molap you store all data in the AS database as well. If you've used a named query to limit the amount of rows you will have less data both to build the cube with and to read through at querytime. Thus you should have both faster processing and query results at the expense of not having as much data to analyze as before. I have only used AS with partitions so I can't really answer on how to emulate them if you don't have them.

Increase Aggregation Percentage

Is there a way to force the design aggregation in BIDS to 100%? All of my dimensions have attribute relations defined (as indicated in the designer) yet I can get only around 35% aggregations defined.

My main issue is an Excel pivot table that the user wants to create that has 3 row dimensions. When the Excel pivot table is first built, access is fast, but when the user drills down to bottom level of each dimension, the query takes a long time.

We have a Hyperion Essbase/Analyzer report that shows the same information rather quickly. The difference I can see is that Analyzer has a list bottom function and the cube space in Essbase is very large (I think Essbase builds an aggregation for everything thus the faster access speed).

Aside from increasing the aggregation, I don't know what else to try. I've read the Project Real paper and utilized tips when possible. I'm using the Standard Edition of SQL Server 2005 so my options are limited.

In upcoming Service Pack 2 release of Analysis Services you will see new sample application that allow you to build set of aggregations based on the information logged in the QueryLog. You will browse your cube and then you'd design aggregations that cover the set of queries you'd like to see working faster.

Edward.
--
This posting is provided "AS IS" with no warranties, and confers no rights.

|||

What does aggregations imply under partitions in BIDS? I took some dimensions out of my cube and performed some other changes and now get 99% optimization. Yet, performance have not increased to an acceptable level. Again, these are queries down to the lowest level in Excel.

The Fact table has about 4 million records (3 cubes share this information). Does it make any difference if I created separate SQL fact tables; one for each company. Right now, I'm limiting the records by a query in the data source view. All of my cubes are MOLAP so I thought this wouldn't affect cube performace once the cubes are calculated.

It seems that more aggregations are created when I don't create attribute hierarchies between the levels. Shouldn't performance be the best when attribute relations are defined for all hierarchy levels?

I also tried restricting the amount of data to a seven month period (current year) and still no luck.

|||

You can't preaggregate everything in any database. If you have say 10 dimensionlevels of 10 members each the possible aggregation size of these levels is 10^10 cells. Unless you are sitting on a very large base data table this is gonna translate into an aggregation that IS your basedata (as that is the actual number of nonempty combinations that exists). Thus it's a complete waste to materialize this aggregation and it isn't done. The fewer number of rows in basedata you have and the more dimensions/levels etc. the fewer of all possible aggregations actually make sense. What you are seeing is that the top levels are aggregated and comes back quickly as they contain much fewer rows then base data. When you drill down into the bottom you are leaving the aggregations and walking through the basedata instead - thus the slowness. This isn't due to sloppy implementation but a mathematical certainty true of all preaggregated databases. It would be a great service if someone included an explanation of this in the official documentation as people who aren't familiar with cubes normally assume they are insanely fast for everything and then get very disappointed when they're not.

I haven't used Essbase but have used Excel and is guessing that the speeddifference you are seeing is mainly due to Excel as a client writing absolutely horrendous MDX causing it to go a lot slower then the server can do if you write it yourself. Unfortunately nothing much can be done about this except waiting for Excel 2007 and hoping it's better or switching to another client.

As for the optimization number I am a little curious myself but think it is the percentage of possible queries that can be answered by aggregations. If you take away enough dimensions eventually you will have so few that every combination leads to a sensible aggregation and thus a very high percentage. Still when you do your drilldown you enter aggregations that are very large - it should go faster then before but perhaps not sufficiently fast for your purposes.

|||

Is this a different feature than usage based optimization that is available for partitions? Can you elaborate and clarify the differences?

Thanks

|||

Aggregations themselves are identical and as described above. There is a set of aggregations that "makes sense". However if we put size limits on the combined size of all these we have to make a choice of which to materialize or not.

There's two wizards to do this; one is used at designtime and assumes that all queries are as likely to happen. I assume (as it makes sense but don't really know) this means it sorts by estimated size and then takes as many as can fit starting from the smallest.

The other wizard is the usage based you're talking about. This one looks through the querylogs and prioritizes aggregations that are actually used by queries. This would mean you might drop several unused smaller aggregations to fit a larger aggregation that is actually used.

|||

Ravel, I see what you mean about number of combinations that needs to be aggregated. As mentioned, the report that the user wants contains three dimensions at the lowest level in the row, plus two filter and two column dimensions. On further analysis of the report, it looks like we would be better off writing a query against the SQL database. However, I will lose the budget variance and year to date cube members.

In an effort to increase response time, I've been trying to create smaller cubes. As mentioned, I don't have the Enterprise edition so I cannot use partioning. If I limit the number of records by creating a named query in the data source view is it essentially the same as creating separate tables of the limited data? Meaning, is cube calculation speed the only thing I'm sacrificing by using named queries in the data source view?

Finally, does anyone have any comments about my setup? I have one fact table that looks like:
District, Acct, Scenario (actual, budget, budget update), YearMo, Acct Seg1, Acct Seg2, Acct Seg3, Acct Seg4, Acct Seg5, Acct Seg 6, Amount. All fields in the fact table represents the lowest level of detail in a related dimension table.

My dimension members in SQL looks like:
District, Acct (level 0), Acct (level 1), Acct (level 2), Acct (level 3), Acct (level 4)
District, Acct Seg1 (level 0), Acct Seg (level 1), Acct Seg (level 2)
Repeat for Acct Seg 2 to 6, Time, Scenario

Each of the dimension tables joins to a fact table field. Each dimension record in SQL contains all of the member rollup information. All dimensions have attribute relationships defined. I cannot think of a simpler design, but perhaps the simple design is not the optimal design. However, am I correct to assume that once the cubes (MOLAP) are processed, the Excel queries do not access the source SQL Server tables?

|||

"However, am I correct to assume that once the cubes (MOLAP) are processed, the Excel queries do not access the source SQL Server tables?"

Yes.

"In an effort to increase response time, I've been trying to create smaller cubes. As mentioned, I don't have the Enterprise edition so I cannot use partioning. If I limit the number of records by creating a named query in the data source view is it essentially the same as creating separate tables of the limited data? Meaning, is cube calculation speed the only thing I'm sacrificing by using named queries in the data source view?"

I'm not quite sure I understand you here. But if you use Molap you store all data in the AS database as well. If you've used a named query to limit the amount of rows you will have less data both to build the cube with and to read through at querytime. Thus you should have both faster processing and query results at the expense of not having as much data to analyze as before. I have only used AS with partitions so I can't really answer on how to emulate them if you don't have them.

Friday, March 9, 2012

Inconsistent behaviour when setting up parameters

Hi guys,

i am having an issue setting up two parameters in the report designer (SSRS 2005). The report is calling a stored proc, these two parameters are optional, when the user decides to run a report i don't want them to input anything for these parameters, they should both be set to null when calling the stored proc (changing the signature of the sproc is not an option).

So, in the Report Parameter dialog, i set both parameters to "Allow null value", "Internal", and default value of null. This is when the first issue strikes: if i "OK" out of the dialog, then save, then go back into the dialog, the designer has "forgotten" that one of the params was internal (the check box is not checked) - but the other one is still okay. How can i prevent the designer forgetting things like this?

The second problem is that even though (at least one of) the parameters is set to internal, when using IE to connect to the report server and run the report the input fields for those params still show up - i don't want the user seeing them. These param input fields are both hidden if i run the report through the reporting services control (ie just right click on the report in the solution explorer and select "run"). Why the inconsistent behaviour? How can i prevent the user seeing those fields when using IE? (if i just set them to hidden then i get an error message "

  • The 'poolList' parameter is missing a value" when i try to run the report).

    The rdl xml for these two paramaters is:

    <ReportParameter Name="poolList">

    <DataType>String</DataType>

    <Nullable>true</Nullable>

    </ReportParameter>

    <ReportParameter Name="poolGroupList">

    <DataType>String</DataType>

    <Nullable>true</Nullable>

    </ReportParameter>

    Nowhere in there do i see that the parameter is internal - where does the designer stick that sort of information?

    Thanks for any answers!

    sluggy

    A report parameter being internal means that it does not have a prompt. Hence, in the underlying RDL file, there is no <Prompt> element under the <ReportParameter> element. So, from the RDL snippet shown in your posting everything looks correct.

    Regarding report server: I recommend that you delete the already published report from the report server before republishing the report with updated parameter information. Otherwise, the old parameter settings and the new parameter settings may get merged (because the administrator on the report server could have decided to change the default value etc. for the published report).

    -- Robert

    |||

    Hi Robert,

    thanks for the heads-up regarding the merging of the reports, i will watch out for it when this project goes live as we won't have control over that report server.

    I have fixed one of my problems. To get rid of the "The 'blah' parameter is missing a value" message all i had to do was remove those parameters in the parameters tab of the configure dataset dialog. As they are named params and they are assigned default values in the sproc i didn't even need to try to pass them. This also means i don't have those input fields showing up in IE but not the control.

    Although this still doesn't explain why the designer was persistently forgetting settings :)

    sluggy

  • Inconsistency between Data Source Designer and underlying code

    I have 2 data sources that have recently been updated from SQL Express to full versions of 2005. The connection strings have been changed, and the changes appear in the code, but the data source designer still shows the SQLExpress portion of the connection string. This seems to be fouling up SSIS packages that are using these data sources. Has anyone else encountered this? If so, what can I do to fix this issue?Can you recreate them?|||Are you using configurations? If so, have you updated them to reflect the right connect strings?|||Turns out the problem was this: the project was checked out by the person who created the datasources. When he checked the datasources in, he did NOT check the project back in. The project's definition contained the incorrect connection strings. Once the project was checked in, that issue was resolved. Thanks for your responses.

    Friday, February 24, 2012

    Inaccurate HTML Rendering

    (RS 2000)
    I have 6 rows of text boxes in my body header running the width of the
    report. All look fine in the report designer and when rendered to Adobe.
    When I output to HTML, there are an extra 2 blank rows between the 5 and 6
    TB. All the TB properties are exactly the same; I have tried
    deleting/recreating, shifting around, nothing seems to help.
    Thanks for any suggestionsFor anyone else that has the same problem I ran into where the HTML output
    looks different than the designer or other output: I found if I slightly
    reduced the vertical height of all the text boxes it fixed the problem.
    "Mike Harbinger" <MikeH@.Cybervillage.net> wrote in message
    news:%23zD8P27TGHA.5908@.TK2MSFTNGP14.phx.gbl...
    > (RS 2000)
    > I have 6 rows of text boxes in my body header running the width of the
    > report. All look fine in the report designer and when rendered to Adobe.
    > When I output to HTML, there are an extra 2 blank rows between the 5 and 6
    > TB. All the TB properties are exactly the same; I have tried
    > deleting/recreating, shifting around, nothing seems to help.
    > Thanks for any suggestions
    >