Today was the last day of the Masters Summit for Qlik in Johannesburg and we all had a great time. One of the things I very much enjoy about these events is the chance to speak with so many experienced people who are enthusiastic about Qlik and keen to share their knowledge. I always go home with some cool new tricks and insights.
What I want to share with you today is an alternative script for building a link table. This was shared with me by Rene McGregor and the nice folks over at QlikView South Africa (who have also been very supportive in bringing the Masters Summit to South Africa, much appreciated).
The link table
A link table is used to associate multiple fact tables within a Qlik data model without creating synthetic keys or circular reference (an alternative is the concatenated fact table). Without going into too much detail, the steps to create a link table are:
- Create a compound link key based on the common dimension keys between the fact tables;
- Create a separate link table that contains the common dimension keys and the link key;
- Remove the individual common dimension keys from the fact tables.
The image below shows a simplified example of a link table and two fact tables.
Building the link table, the traditional way
The traditional way to build the actual link table (step 2 described above) is to fetch the link key and common dimensions from every fact table using a LOAD DISTINCT and concatenating all the results together. Some people might even perform a LOAD DISTINCT on the resulting link table, but that is unnecessary because using the DISTINCT keyword anywhere on a table will always ensure that the end result is distinct.
While this approach is simple and straightforward, when you are dealing with larger volumes of data, LOAD DISTINCT can be quite time consuming. My preference for larger data volumes is to use a concatenated fact table, but there are cases where a link table cannot be avoided. Rene showed me an alternative scripting solution to create these link tables much faster.
Building the link table, using FieldValue() and FieldValueCount()
FieldValue() and FieldValueCount() are functions that directly operate on the symbol tables, This means that rather than processing the data tables, which might contain duplicate values, we get direct access to all the unique values of a field. When you have a large table, the difference in processing time can really add up. For this reason, most modern calendar scripts use the FieldValue() method for retrieving min and max values, rather than processing the entire table.
What Rene showed me is that this technique can also be applied to link tables, using the following statement:
This small piece of script will create a table containing every distinct link key value. On a subsequent preceding load, we can then split out the link key into the individual dimension keys using the SubField() function. The full script will then look like this:
This solution performs much faster on bigger data volumes and uses less lines of code, a very elegant solution!
I have included this solution in an example file, which you can download below.
What are your thoughts on this approach? Do you see any other areas where these functions/techniques might be useful? I’d love to hear your ideas about this, so feel free to comment below.