Merging tables: the JOIN prefix

By Barry Harmsen

Merging tables: the JOIN prefixThis is the second post in a series on merging tables in QlikView. In the previous post I wrote about appending rows to tables by using the CONCATENATE prefix, this time I will show you how to add  columns to a previously loaded table by using the JOIN prefix.

Table of contents

The story so far

Concatenated tablesIn the previous post we discussed the example of a paper company that wanted to merge the employee list of their Sales Department of one of their branches (the Stamford branch) into the employee list of another branch office (the Scranton branch). We merged the two tables by using the CONCATENATE prefix, demonstrating the difference between implicit and explicit concatenation along the way. The resulting table is shown on the right.

The new requirements

The corporate dataNow that we have a single, merged Employee table, the manager of the Scranton branch comes up with an additional requirement; wouldn’t it be a cool idea if we could link corporate’s HR data with the merged employee table so we also have everyone’s job description and pay scale available in our reports and dashboards? While I personally reserve the “cool idea” qualification for ideas that are really, well, cool (say, shark cage diving or designing your own heart valve implant) for the sake of this post lets assume that this is a cool idea and that we really want to implement it. So how do we proceed?

Just load the table

Tables loaded separately

The first option we could consider is just loading the table as-is. The resulting data model would look like the picture on the right. Technically, there is nothing wrong with using this approach but I prefer to merge the tables for two reasons:

  • It makes for a simpler, less-cluttered data model;
  • Logically, it makes more sense for employee-related data to be grouped together into a single employee dimension table.

To merge the columns from two (or more) tables we can use the JOIN prefix.

The JOIN prefix

When the JOIN prefix is put in front of a LOAD statement, it tells QlikView not to load the data into a new table, but instead join the data to a previously loaded table.

Join script example

The join is a natural join made over all the common fields. This means that QlikView compares all the common fields between the two tables and joins the rows from both tables when the values in the shared columns match.

Optionally, the name of the table to which the data should be joined can be specified between parentheses. If no table name is specified, QlikView assumes that the data should be joined with the last loaded table. Those who read the first post know that I am not a big fan of making assumptions in the load script, it makes the code harder to read and maintain. Therefore I always specify a table name.

Join types

When you look closely at the two tables that we want to join, you notice that not all Person ID’s correspond between the tables. The Employees tables contain an employee, Michael Scott, who is not present in the HR Data table. Conversely, the HR Data table contain data for a Receptionist who is not present in the Employees table.

Not all values between tables match

When we do a standard join, the tables are merged so that the corresponding records from both tables are shown on the same row. Records without a corresponding record in the other table get null (empty) values for those columns. The result is the following table:

Standard join

This is almost the merged table we want, but it still contains data for a Receptionist who is not an employee at the Scranton branch. Fortunately there is a way to solve this, by specifying the join type.

By using a join type we can specify how QlikView joins the tables, we can use the following options:

  • INNER: Only records which are present in both tables are included in the merged table. In our example this would mean that both Person ID 1 (Michael Scott) and 10 (Receptionist) would be excluded from the merged table.
  • OUTER: All records from both tables are included in the merged table, regardless if they have corresponding records in the other table. This returns the same result as the standard join.
  • LEFT: All records from the first table are included in the merged table, and only those records from the second table which have a corresponding value in the first table. In our example this would mean that Person ID 1 (Michael Scott) would be included and that Person ID 10 (Receptionist) would be excluded, as it has no corresponding value in the first table.
  • RIGHT: All records from the second table are included in the merged table, and only those records from the first table which have a corresponding value in the second table. In our example this would mean that Person ID 10 (Receptionist) would be included and that Person ID 1 (Michael Scott) would be excluded, as it has no corresponding value in the second table.

As we want to keep the entire Employee table and only add the corresponding values from the HR Data table to it, we decide to use the LEFT JOIN prefix.

Left join

This results in the following merged table:

Left joined table

And that’s how it’s done! The example QVW for this post shows the completed script and also shows what result you get when you use another join type. As always, if you have any questions or comments, feel free to drop them in the comments section.

Downloads

Download the example QVW and source data

About The Author

Barry Harmsen

Hi there, I'm Barry and I'm a Business Intelligence Consultant at Bitmetric and based in the Netherlands. Originally from a background of 'traditional' Data Warehousing, Business Intelligence and Performance Management, for the past 10 years I have been specializing in Qlik and a more user-centric form of BI. I have done numerous QlikView and Qlik Sense implementations in many different roles and industries. In 2012 I co-authored the book QlikView 11 for Developers. You can follow me on Twitter at @meneerharmsen.

26 Comments

  • 1
    Dinesh Sharma
    February 8, 2011 - 18:30 | Permalink

    Awesome article… thanks a lot for sharing…. What will be your next about and how long we have to wait for it ?

    • 2
      February 9, 2011 - 11:17 | Permalink

      Thanks. I am not yet sure what the next post will be about and when it will be there. I have a whole bunch of ideas and articles in various states of completion, but no fixed planning for posting/writing them.

      If you want to be kept up to date, if you sign up on the right-hand side (near the top of the page) you will be notified when a new article is posted.

      • 3
        chetan
        June 17, 2011 - 08:53 | Permalink

        I have question how would we manage self join. Hope you are aware of employee table of oracle. Can you tell more about it. How to handle in Qlikview

        • 4
          June 17, 2011 - 23:21 | Permalink

          Chetan,

          Yes, I am aware of the Employee table in Oracle, Scott’s working on that one right? 😉

          With a self-join you have multiple options, but the two that come to mind are:

          – If there are very few levels, say only an employee and a manager (2 levels), then you could load the table twice and use a left join, so in pseudocode:


          load
          empno,
          ename,
          mgr
          from employee;

          left join
          load
          empno as mgr
          ename as mgr_name
          resident employee;

          – If there are more levels, or if you do not know beforehand how many levels there are, you can use the “hierarchy” prefix. The QlikView help file explains this quite well. Basically, you tell QlikView which field contains the ID of the node (in this case Employee) and the field that contains the ID of the parent-node (in this case Manager). In pseudocode:


          hierarchy(empno, mgr, ename)
          load
          empno,
          ename,
          mgr
          from employee;

          This will “flatten out” the table. Hope this explains it adequately. I will probably expand upon this in a future blog post.

          Kind regards,

          Barry

  • 5
    Amarilis Flores
    February 17, 2011 - 20:28 | Permalink

    What if you want to put a value on the null field?
    Like Department ‘Management’ would have a salary scale of 10 and not null?

    Thanks.

    • 6
      February 17, 2011 - 21:42 | Permalink

      Then you would first concatenate the pay scale to the HR data table before joining it to the employee table. something like this:

      // Load the ID’s for all employees that are in
      // management and concatenate the table with the
      // corporate HR table.
      CONCATENATE ([HR Data])
      LOAD
      [Person ID],
      10 as [Pay scale]
      RESIDENT Employees
      WHERE Department = ‘Management’

      Hope this helps.

  • 7
    April 16, 2011 - 23:02 | Permalink

    Hola,

    enhorabuena por el artículo. De interés y muy completo. Parece sencillo lo que hace qlikview pero es impresionante!

    Un saludo

  • 8
    Pho3niX90
    June 23, 2011 - 10:25 | Permalink

    How would i go about using another tables fields in a seperate table, for example.

    DATA table;
    usd ammount,
    (usd ammount / Zar) as ConvUSD
    zar ammount,

    Cur Table:
    Zar,
    Usd,

    Hope you get the idea.. thanx

  • 9
    Henco
    September 10, 2011 - 13:53 | Permalink

    Very helpfull Barry. Thank you very much!

  • 10
    Arie
    September 16, 2011 - 09:42 | Permalink

    Great article, thanks!

    One thing which is still confusing me a bit is the use of drop tables, resident, etc. Could you perhaps write an article on the use of that?

    Thanks

  • 11
    Matt
    June 22, 2012 - 22:45 | Permalink

    Images? It appears that the images from this and all prior articles have disappeared. Makes it a little hard to visualize what you’re saying without them.

    • 12
      June 22, 2012 - 23:01 | Permalink

      Hi Matt,

      I’m using Dropbox to host my image files. Perhaps Dropbox is blocked on your network? (I’ve seen some company networks blocking Dropbox) If you drop me a line through the contact form (at the top of the page) I will send you a PDF of this post.

      Cheers,
      Barry

  • 13
    Jeff
    July 12, 2012 - 19:46 | Permalink

    Barry, Excellent article!

    Along the lines of joining two tables…

    I have a large table that contains Incidents up to a point in time. The maximum value of the “Last Changed Date” is saved.

    I then want to extract tickets from the same data source where the “Last Changed Date” is greater than the highest value of the last extract.

    Then, merge the two tables (Original plus updates), removing duplicate Ticket Numbers by keeping the record with the higher “Last Changed Date” and dropping the old?

    This new combined table which will then become my Original (for the next run),

    Hope you can help,

    Regards, Jeff

    • 14
      July 12, 2012 - 22:40 | Permalink

      Hi Jeff,

      What you’re looking for is an incremental load. I am assuming that each incident has a unique ID, and that you have your previously loaded incidents stored in a QVD file. If this is the case, then the logic to use is:

      – Load all the new incidents from the source system since “Last Changed Date”
      – Concatenate load all existing incidents from the QVD where not exists(Incident ID)
      – Store the concatenated table to QVD

      The magic is in the second step, by only loading records with an Incident ID that was not loaded in the first step you discard all records from the existing table for which an updated version was found in the source system.

      Hope this helps.

      Cheers,
      Barry

  • 15
    dado
    July 17, 2012 - 16:43 | Permalink

    Hi Barry,

    Very good article…however i still dont understand how to link without forcing the same names in columns.

    for example:

    i have a file:

    Shop:
    id
    kk_id
    url

    and

    raw_log:
    creation_tstamp,
    uid,
    shopid,
    unique_name,
    affiliateid,
    number_of_clicks

    i need to link shop.kk_id to raw_log.affiliateid and raw_log.shopid… i m trying to create something but it just doest work, joins have to be right outer…i didnt think this is so complicated to do in QV…

    do you have any advise on how to do this? (without concatenation)

    regards,
    Dado

  • 16
    BAMaustin
    October 1, 2012 - 16:05 | Permalink

    Barry,
    The capture images for these articles are not being found on 09/28/02012 using IE8.
    So I can’t see your source code examples!

    Could you check to see if the paths have changed?

    Thanks.

  • 17
    Kumar
    April 9, 2013 - 23:38 | Permalink

    Barry:

    In the abocve example is it possible to just bring the unique values (that is Record 1 & Record 10) . Record 1 is in the 1st table but not in the 2nd table and record 10 is in the 2nd table but not in the first table

    thanks

  • 18
    Rob
    December 12, 2013 - 19:54 | Permalink

    Good stuff, but shouldn’t employee 2’s occupation be Assistant to the Regional Manager?

  • 19
    TV
    March 18, 2014 - 16:38 | Permalink

    Hi Barry,

    Thank you for this great article. I now understand the difference between merging tables using CONCATENATION and JOIN. What I would like to know is when to use which. In what situation one has to use JOIN and vice versa.

    If I have mix grain data, e.g., one table contains summary rolled up at the month level while the other table contain data at the transactional date level, should I join tables using CONCATENATION instead of JOIN?

    Thank you very much in advance for your insight.

    TV

  • 20
    ritu
    March 27, 2014 - 00:36 | Permalink

    thanks Barry.

    I had joined two table using left join and created a pivot of it. However for the value field which has null value (as there is no data in the right hand table correspond to left hand table), the entire record set is getting suppressed in the pivot when I am using the value field as an aggregate. However I want to display the records even with null value field in my pivot. How do I achieve that???

  • 21
    Jonathan Cohen
    June 24, 2015 - 15:01 | Permalink

    Hi Barry,

    Bought your book. Thanks.

    I have a problem, I am loading a list of accounts from an Excel spreadsheet.
    Then I want to load from a SQL table, data for only these accounts.
    So I am using an INNER KEEP, but its not working. (its loads everything..)

    Thanks

  • 22
    Auguste Moutima
    July 17, 2015 - 01:07 | Permalink

    Hi Barry,

    I have an assignment where I have to join, combine or put together two columns into one. the table is as follow:
    Unsubscribe obsolete
    1/20/2015 1/20/2015
    1/25/2015 1/25/2015
    1/28/2015
    2/04/2015 2/06/2015
    2/10/2015 2/10/2015
    2/16/2015
    3/14/2015 3/14/2015
    I want to create one column Cancellation date from this 2.

  • 23
    AKHILAF
    March 17, 2017 - 09:24 | Permalink

    Hi, I am new of this Qlikview . I have a small doubt.
    My Question Is
    I take a fields Region -N&W
    Year-2016
    Month-North(Jan,Feb,Mar,May,Jun,Jul,Aug,Oct,Nov)
    west(Jan,Feb,Mar,Apr,May,Jul,Aug,Nov)
    Sales
    Customer A (N)Region
    B(W)Region

    I need to calculate the sales as separate Of north and west regions By using Peek Function

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    Read previous post:
    Hands-on with QlikView Script Generator

    I have a confession to make. As someone who comes from a background of visual ETL tools (such as SAP...

    Close