A very elegant link table script

By Barry Harmsen

Some nice out of the box thinkingToday was the last day of the Masters Summit for Qlik in Johannesburg and we all had a great time. One of the things I very much enjoy about these events is the chance to speak with so many experienced people who are enthusiastic about Qlik and keen to share their knowledge. I always go home with some cool new tricks and insights.

What I want to share with you today is an alternative script for building a link table. This was shared with me by Rene McGregor and the nice folks over at QlikView South Africa (who have also been very supportive in bringing the Masters Summit to South Africa, much appreciated).

The link table

A link table is used to associate multiple fact tables within a Qlik data model without creating synthetic keys or circular reference (an alternative is the concatenated fact table). Without going into too much detail, the steps to create a link table are:

  1. Create a compound link key based on the common dimension keys between the fact tables;
  2. Create a separate link table that contains the common dimension keys and the link key;
  3. Remove the individual common dimension keys from the fact tables.

The image below shows a simplified example of a link table and two fact tables.

Link table example

Building the link table, the traditional way

The traditional way to build the actual link table (step 2 described above) is to fetch the link key and common dimensions from every fact table using a LOAD DISTINCT and concatenating all the results together. Some people might even perform a LOAD DISTINCT on the resulting link table, but that is unnecessary because using the DISTINCT keyword anywhere on a table will always ensure that the end result is distinct.

While this approach is simple and straightforward, when you are dealing with larger volumes of data, LOAD DISTINCT can be quite time consuming. My preference for larger data volumes is to use a concatenated fact table, but there are cases where a link table cannot be avoided. Rene showed me an alternative scripting solution to create these link tables much faster.

 

Building the link table, using FieldValue() and FieldValueCount()

FieldValue() and FieldValueCount() are functions that directly operate on the symbol tables, This means that rather than processing the data tables, which might contain duplicate values, we get direct access to all the unique values of a field. When you have a large table, the difference in processing time can really add up. For this reason, most modern calendar scripts use the FieldValue() method for retrieving min and max values, rather than processing the entire table.

What Rene showed me is that this technique can also be applied to link tables, using the following statement:

Link Table s

This small piece of script will create a table containing every distinct link key value. On a subsequent preceding load, we can then split out the link key into the individual dimension keys using the SubField() function. The full script will then look like this:

linktablescriptfull

This solution performs much faster on bigger data volumes and uses less lines of code, a very elegant solution!

I have included this solution in an example file, which you can download below.

Download the FieldValue() link table example

What are your thoughts on this approach? Do you see any other areas where these functions/techniques might be useful? I’d love to hear your ideas about this, so feel free to comment below.

About The Author

Barry Harmsen

Hi there, I'm Barry and I'm a Business Intelligence Consultant at Bitmetric and based in the Netherlands. Originally from a background of 'traditional' Data Warehousing, Business Intelligence and Performance Management, for the past 10 years I have been specializing in Qlik and a more user-centric form of BI. I have done numerous QlikView and Qlik Sense implementations in many different roles and industries. In 2012 I co-authored the book QlikView 11 for Developers. You can follow me on Twitter at @meneerharmsen.

11 Comments

  • 1
    Rene McGregor
    September 9, 2016 - 15:11 | Permalink

    Thanks for the mention Barry!!

    I’m excited to see other ways to use this method.

    Fully enjoyed the Masters Summit.

  • 2
    September 9, 2016 - 16:11 | Permalink

    Thanks, Barry

    That includes an improvement to the autogenerate part versus the While Not loop I described in http://qlikviewmaven.blogspot.com/2009/01/load-table-with-all-of-values-for-field.html

    Another performance improvement for people using this technique is to convert the key field to a number using the AutoNumber function and then drop the original key field. The numeric key fields are smaller and require less memory and, I suspect, slightly faster.

    -Tim

    • 3
      September 9, 2016 - 18:00 | Permalink

      Hi Tim,

      Wow! Wish I’d seen your post (from 2009!) sooner.

      I agree that AutoNumber would make it even nicer. What approach would you suggest for this? Initially, you’d have to have both the AutoNumber and the regular concatenated key in all fact tables, so that you can pick up the keys using FieldValue(), split them, and AutoNumber the key in the link table. Right?

      I see that you’ll be joining us in Austin next month, look forward to meeting you there!

      Kind regards,
      Barry

  • 4
    Digvijay Singh
    September 12, 2016 - 16:17 | Permalink

    Thanks for sharing this. Just one doubt, what if there are other dimensions which don’t need to be part of the composite key but are common between both fact table. I think in your book example you have this case. Now the question is how such non key element dimensions will be fetched.

  • 5
    Lorna Louw
    September 15, 2016 - 12:48 | Permalink

    hi Barry,

    was not so fortunate to be able to attend :(-

    I have a Qvw where-in I have to join multiple fact tables…and its quite complex – the tables dont have a lot in common (except for basics like date).

    I have posted on QlikCommunity, but I think due to the complexity, not a lot of feedback.

    May I bother you to assist, or may you guide me to someone who can?

    Regards
    Lorna

    • 6
      September 16, 2016 - 13:56 | Permalink

      Hi Lorna,

      Can’t make any solid promises, but if you use the contact form to send me an email with the problem then I might be able to take a peek. Again, no promises, swamped in work at the moment, but maybe it’s an issue I’ve seen before and have a quick solution for.

      Kind regards,
      Barry

  • 7
    David Dumas
    September 16, 2016 - 14:26 | Permalink

    Very clever solution. I have always used Autonumberhash128() for any compound key. It survives across qlikviews. When building Qlik data marts and correlating measures from multiple marts across common dimensions, you need to make sure the keys cannot change.

  • 8
    September 17, 2016 - 07:04 | Permalink

    The Best!!! Excellent work. I use a similar technique as a standard for generating master calendar. But beyond that I have not moved.

  • 9
    Carlos Castaño
    November 10, 2016 - 14:41 | Permalink

    Hola Barry, Buen tip, lo pondré en práctica, muchas gracias.

  • 10
    Buddy Bains
    June 15, 2017 - 18:19 | Permalink

    I’m new to Qlik (using the Sense product) and this article really helped me understand implementing Linked Tables. I have a question: I have 2 synthetic keys that occur in my model: FacilityName + TransactionDate and FacilityName + AccountNumber. I created a link table containing FacilityName + AccountNumber and it resolved that synthetic key. However, when I created a second link table to resolve the Facility Name + TransactionDate synthetic key, These two link table were associated by Qlik and I got new synthetic keys and a circular reference. My intuition is that I need only create a single link table containing FacilityName + TransactionDate + AccountNumber, I’m concerned that this will not work because some rows in it will contain NULL TransactionDate or AccountNumber (The FacilityName + accountNumber combination only occurs in a couple of instances and those table do not always the TransactionDate field).

    As an example, my fact tables look like this:

    FactTable1
    FacilityName
    TransactionDate

    FactTable2
    FacilityName
    AccountNumber

    FactTable3
    FacilityName
    AccountNumber
    TransactionDate

    Am I moving the the right direction? Any advice is appreciated.

  • Read previous post:
    Event tip: Dutch Information Visualization Event 2016

    Quick event tip: are you based in the Netherlands and interested in (tool agnostic) Data Visualization? If so, check out...

    Close