Testing the performance implications of variables and label referencing versus direct expressions

By Barry Harmsen

Testing the performance implications of variables and label referencing versus direct expressionsAfter my previous post about variables, I got an interesting question from DV. He asked me about the reuse of chart expressions by referencing the label of another expression (“label referencing”), and what the performance implications of using variables and label referencing versus direct expressions are.

I use variables and label referencing extensively in my applications, but I never really tested what this means for performance. I have always assumed that using variables instead of direct expressions would have a slight impact on performance. I also suspected that using label referencing would result in significantly better performance (I will explain this later).

But was this really true? Triggered by DV’s question, I set up a small experiment to test my assumptions.

For those of you that are in a hurry or do not want to read a lot of text before reaching an unexciting conclusion: My experiment seems to indicate that there is a very, very small performance cost when using variables or label referencing instead of direct expressions. For all intents and purposes this difference is negligible however, resulting in only a roughly 0.3% slower calculation for the slowest option.

For those of you that want to see how I did it, and maybe pick up some techniques for identifying and resolving performance issues in your QlikView applications: Read on!

The set-up

As performance issues with calculation of objects are usually not an issue with small data volumes, I decided that I would need a moderate-to-large dataset to test with. I generated a 500 million row CSV file containing the following data:

  • Customer: random customer names between AAA and ZZZ, resulting in 17,576 unique values;
  • Product: random products named A to Z, 26 unique values;
  • Year: random year from 2008 to 2011;
  • Actual revenue: random number between 0 and 1,000,000;
  • Target revenue: random number between 0 and 1,000,000.

This resulted in a 12.5 GB CSV file, that I then loaded into QlikView. (It also turned into a 4 GB QVD file, but that is the subject for another post)

I then created a very big pivot table containing actual and targeted revenue for each customer by product and year. Additionally, I created a calculated field to compare actual versus targeted revenue.

Very big pivot table, containing 500 million records

The idea was to create an object that contained many records and calculations, so that any differences in calculation speed between the various calculation options would be amplified.

The calculation options

For the expressions, I tested three different options to see how the calculation time would be affected:

  • Direct expression: this is the straightforward option, directly typing the expression into the expression definition:
    • Actual revenue: sum(Actual)
    • Target revenue: sum(Target)
    • Actual vs Target: sum(Actual) / sum(Target)
  • Label referencing: this option is the same for the first two expressions, Actual and Target revenue.  However, for the third expression (Actual vs Target) calculation, instead of using the Sum() function we reference the earlier two results by their expression label names. My impression was that this might calculate faster, because for the last expression QlikView does not need to calculate the sums of the actual and target revenue again. Instead it can use the results that were calculated in the other two expressions:
    • Actual revenue: sum(Actual)
    • Target revenue: sum(Target)
    • Actual vs Target: [Actual revenue] / [Target revenue]
  • Variables: Instead of using direct expressions, this option uses variables to store the calculations. I suspected that this method might be a little bit slower, as QlikView would also have to perform Dollar-sign expansion:
    • Actual revenue: $(vActual)
    • Target revenue: $(vTarget)
    • Actual vs Target: $(vAT)

The first test

My initial idea was to create three copies of the pivot table, each containing a different calculation option. After making a selection, I would check the CalcTime for each of the pivot tables via the Objects tab in the Sheet properties dialog.

Sheet objects

There was a problem however. It looks as if the CalcTime is not showing the individual calculation time for an object, but rather the cumulative calculation time when the object finished updating. So the first pivot table to be calculated would show a calculation time of 30 seconds, while the second would show 60 seconds. It would actually not take 60 seconds for the second table, but since it had to wait until the first table was calculated that time was being added as well.

The second test

As the three-in-one approach was not going to work, I decided to create three identical applications with only the calculations being different.

I then proceeded to make the same 25 selections (in the same order) in each of the three applications. Afterwards I exported the memory statistics via the General tab in the Document Properties dialog.

Export memory statistics

This function exports a text file containing memory usage statistics for the various parts of the QlikView document. These .mem files can then be loaded and analyzed in the QlikView Optimizer application. This application originally came with the training materials for QlikView 8.5, but is still compatible with version 10. You can download it by clicking here.

QlikView Optimizer result for the pivot tables

As you can see in the result above, the direct expression has the lowest average calculation time. The variable ends in second place. Label referencing turned out to be the slowest option, which surprised me. Overall the difference between the fastest and slowest option was 0.33%.

Conclusion

So, what did I learn? Well, my suspicion that using variables instead of direct expressions impacts calculation time seems to be confirmed. However, the difference is so small that the it is virtually unnoticeable to the user. I was surprised that label referencing turned out to be the slowest option. Although at 0.33% slower, this is hardly a problem.

While I did not gain any shocking insights from this experiment, it is nice to know that performance is not an issue when picking which option to use. You are free to pick the option that best fits the situation.

For those of you interested in generating random data, here is the script I used.

Download the data generation script

As always, if you have any comments, questions or critiques on the unscientific method I used ;), feel free to drop them in the comments below.

About The Author

Barry Harmsen

Hi there, I'm Barry and I'm a Business Intelligence Consultant at Bitmetric and based in the Netherlands. Originally from a background of 'traditional' Data Warehousing, Business Intelligence and Performance Management, for the past 10 years I have been specializing in Qlik and a more user-centric form of BI. I have done numerous QlikView and Qlik Sense implementations in many different roles and industries. In 2012 I co-authored the book QlikView 11 for Developers. You can follow me on Twitter at @meneerharmsen.

6 Comments

  • 1
    DV
    July 19, 2011 - 09:25 | Permalink

    Thanks so much for taking time to test. This is very useful to know. Especially I am not aware of memory optimiser tool. Thanks

  • 2
    DV
    September 16, 2011 - 11:52 | Permalink

    Hi Barry,

    I hope you are doing well. I was testing the performance implications of Normal Chart vs Minimized Chart vs Conditional Hidden Chart. So far I have realised that both the Minimized and Hidden Charts are not evaluated by QV Engine. But I am not sure if this is right. Please can you give some ideas?

    Many thanks in advance.

    Cheers – DV

    • 3
      September 19, 2011 - 21:34 | Permalink

      Hi DV,

      I’ve not really tested this, but have noticed that sheets update faster when a ‘slow’ chart is minimized. When you try to restore the slow chart it also takes a while before it is shown, probably because it is being calculated at that moment. I think it is a safe bet that QlikView only calculates visible charts.

      Cheers,
      Barry

  • 4
    Ralph
    May 1, 2013 - 21:42 | Permalink

    Barry, whilst there is a slight performance impact by using variables instead of direct expressions do you have a preference as to what you use in charts?

    In one particular project I used variables in charts for the simple reason that it was easier for end users to understand if they wanted to create their own queries rather than them using set analysis. I’ve probably answered my own question in that its probably on a case by case basis as to which method to use but it would be good to know if you had a personal preference?

    Thanks,

    Ralph

  • 6
    March 9, 2014 - 16:57 | Permalink

    Great post… this notes could be helpful too… (in spanish)
    http://qlikviewapuntes.blogspot.com.ar/2014/03/apuntes-sobre-performance.html

    regards

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    Read previous post:
    Not all variables are created equal

    It has been a while since my last post. To get back in the habit of regular updates, I am...

    Close