Friday Qlik Test Prep – Week 1 – Optimized Load

By Bitmetric Admin

For the new year, we’re doing something new at Bitmetric. Each Friday, on our LinkedIn Company Page, we’ll share a test question that is representative of what you’ll find on the Qlik Business Analyst, Data Architect or System Administrator certification exams. Including the strange or vague wording that’s sometimes found in these exams 😉

We’ll follow up each Monday with the correct answer, as well as some additional explanation and insights. We hope this will help many of you prepare for your Qlik certifications, or at the very least provide a bit of fun and discussion.

Last Friday, we posted the following question:

No alt text provided for this image

The correct answer is answer C

This provides an optimized load of the QVD which is the fastest way of loading a QVD into Qlik. If we look at the others options, we’ll see that answers A and B will also work but neither will have an optimized load (more on that below). Answer D does not even work at all because it creates a duplicate field in the table which leads to a script error (field names must be unique within a table). Even without the script error, the expression does not limit the loaded rows, it only sets null values for countries that aren’t ‘The Netherlands’.

So why would you want an optimized load?

For speed! 🚀 An optimized QVD load is the fastest way to load data from a QVD into Qlik. And while even a non-optimized load from is typically much faster than loading from other sources, the difference between an optimized load and non-optimized can be significant. For example, on a sample set of 22 million rows the optimized load was 3 times faster than a non-optimized load. Imagine the difference when you’re dealing with 100’s of millions of rows or if you need to load data from many different QVDs. This will save load time and will keep you from getting distracted while waiting for the reload dialog to finish 😉. Of course this also applies to server reload performance when you’re running scheduled tasks.

How do you ensure an optimized load?

Many operations will cause a QVD load to be non-optimized. To keep it optimized, limit your operations to:

  • Renaming fields (using an alias). You can also load the same field twice under a different alias. This can be useful to create a separate key field.
  • Omitting fields by not including them in the LOAD statement
  • Use a single WHERE EXISTS, with a single parameter.So WHERE EXISTS([Country]) is OK, WHERE EXISTS([Country], [ISO Country Code]) is not.
  • JOIN, KEEP or CONCATENATE with another table
  • LOAD DISTINCTwill also keep a load optimized. The DISTINCT part will be processed after the LOAD however, so you might still want to think twice before applying it to very large QVDs.

The following operations prevent an optimized load. If you want an optimized load then don’t do any of the following:

  • Transform a field. For example, Upper([Country]) AS [Country Capital]. Or by using an ApplyMap()
  • Using a WHERE clause, other than a single WHERE EXISTS(). This is why answers A and B will not result in an optimized load.
  • Load data into a mapping table
  • Alias the field you’re using in the WHERE EXISTS clause

How can you check if a load is optimized?

Besides manually checking your script for the points listed above, the easiest way to check is to keep an eye on the script log, either in the data load progress window or the log file. If you see (QVD (row-based) optimized) then you’ll know you have an optimized load.

No alt text provided for this image

Should you always use an optimized load?

If it’s specifically asked on your Qlik certification exam then yes, definitely! In real life? It depends. Remember that 22 million row sample set with the impressive performance gains (3 times faster! 🚀) that we mentioned above? It went from 3 seconds to 1 second. In that scenario optimization is completely unnecessary.

Our advise is to focus on creating readable and maintainable scripts first (have you seen our coding conventions?), and to worry about optimized loads (and performance in general) only when it’s expected to become an issue. Or as Donald Knuth succinctly put it: “Premature optimization is the root of all evil”.

See you next Friday!