Never think you know it all! Today I have for you a little ‘gotcha!’ about the LOAD DISTINCT statement that I picked up from Oleg Troyansky at the Masters Summit for QlikView.
Common knowledge about LOAD DISTINCT is that it will remove duplicate records from the input table, similar to how the DISTINCT clause works in SQL. For example:
However, when used in combination with the JOIN or CONCATENATE statement, the results can be surprising, and even counterintuitive.
The spoiler
When combined with the JOIN or CONCATENATE statement, LOAD DISTINCT will not only remove all duplicates from the input table, but also from the resulting target table.
The case
Consider the following two tables; Sales and Customers:
You will notice that both tables contain duplicate records, the Sales table contains identical records for ID’s 2 and 3, and the Customers table contains two records for Mickey Mouse. If we JOIN these two tables together using the following script:
Then the result is the following table:
There are no surprises here. There are 4 rows for Mickey Mouse with a Sales Amount of 300, as both tables contained two duplicate rows and 2 x 2, the number of ways in which both records can be joined, equals 4. There are also 2 rows for Donald Duck with a Sales Amount of 400, as the Sales table contained duplicate rows for ID 3.
The DISTINCT LOAD
Now we add one small change to the script, we use LOAD DISTINCT to load the second table:
Since I already revealed the spoiler at the start of this post, that using LOAD DISTINCT with the JOIN or CONCATENATE statements will also remove all duplicates from the target table, you have probably guessed that the previous piece of script results in the following table:
The twist
Now that we’ve seen how LOAD DISTINCT impacts the resulting table, let’s make one small change to the script; instead of using LOAD DISTINCT on the second table we’ll use it on the first table:
Can you guess what the resulting table is going to look like? Think about it for a second before scrolling down further. When deduplicating the first table and then joining the second table (which also contains duplicates), the result must be something like the one shown below, right?
Gotcha!

Oh, and by the way, you’re actually a ghost.
In an M. Night Shyamalan-esque twist (ok, so not really), the result is actually exactly the same as when we used LOAD DISTINCT on the second table:
The conclusion
Whenever the DISTINCT clause is used on a table, the table’s records will stay distinct, no matter what you JOIN or CONCATENATE to the table. This can have all sorts of unwelcome consequences, such as duplicate-but-correct facts disappearing from your fact table. Of course, it can also have positive effects; this behavior is very useful when you need your link table to only contain distinct combinations
The advice
Be aware of this behavior in QlikView, if you need to do a DISTINCT LOAD combined with a JOIN or CONCATENATE statement consider if you might be losing duplicate rows, and if this is something that is desirable. If it’s not, then you should probably load the data into a temporary, resident table before joining it to your other data.
The example app
The example application for this post lets you load the example tables in a number of ways and displays the associated script and resulting table.
Download the example application
Got any other QlikView gotchas? I’d be very interested to hear about them, so please feel free to leave a comment.
20 Comments
Wow!
Never ran into that one before (that I know about!)
Could be very interesting to make it happen on purpose!
I was very surprised about this as well. I get how joining/concatenating a distinct table to another table might make the target table distinct as well (though that’s already quite counter-intuitive when you’re used to SQL), but that joining duplicate records to a table that was loaded distinct earlier still leads to a distinct table is really strange. It puzzles me why it would be implemented like this.
Good article! I have to check all my old code now, thanks for that 😉
Mmmmmh…
Me too I have to check some old code…
Many thanks for your well done explanation…;-)
Giulio
As a “source” for this gotcha, let me add my two cents to the story…
What’s important to remember is this: “DISTINCT will always work on the end result”. Whether you use it in a single LOAD or a series of auto-concatenating LOADs, or as part of a JOIN or a Forced-Concatenated LOAD, DISTINCT will always affect the final result.
So, sometimes it’s good. For example, when you are building a classical Link Table, you load a DISTINCT list of key fields from multiple data sources. Intuitively, after 2-3 loads from different sources, you want to reload the final table once again, to ensure that no keys were loaded twice. No need. Why? Because DISTINCT is always applied to the end result. Even if some of the keys exist in multiple data sources, that final DISTINCT in the last load will make the whole table DISTINCT.
Many times, it’s causing disasters, and that’s very well described in this article by Barry.
cheers
Thanks mate, you really saved my ass!
Nice explanation there, thanks!
Do you know if its possible to exclude certain fields from the distinct expression?
you cant overcame this problem by using:
Load RowNo() as ID_2,
… (then the other fields).
This way you always have a diferent number on the ID_2.
🙂
Hello Barry,
coming back to this older post. Today my colleague did a distinct load then droped a field from table and then stored the table to a QVD. BUT, rowcount in QlikView was different from rowcount in QVD. QVD records were not distinct, but data in QV were distinct. Try code below:
Temp:
LOAD * INLINE [
A, B
1, 1
1, 2
];
A: NOConcatenate LOAD Distinct A, B Resident Temp; DROP Table Temp;
DROP Field B;
STORE A into A.qvd (qvd);
It seems as if DISTINCT was applied only at the end of script processing, so it did not affect data when the table was stored.
good post..
But blind distinct deceives but identifying the fields and combining them and then having distinct will provide the soln many times.Thanks
Hello Sir,
How can we take a distinct count of distinct… That is
If jan1 = 10 ids
Jan 2 = 20 ids (5 ids common between jan1 and jan2)
Jan 3 = 30 ids ( 7 ids common between jan1, jan2 and jan3)
Then i should be able to remove the common ids.
Hence in the YTD graph it should come as
Jan 1 = 10
Jan 2 = 15
Jan 3 = 23
Please help me know how can I achieve this.
So, I want to show Duplicate values
I have Example like this
LOAD * INLINE [
Country, Sales
India, 5000
Us, 2500
Uk, 4000
India, 5000
Us, 5500
India, 4500
];
I want to show one straight table with India data like
India 5000
India 5000
India 4500
Hi every one,
I have the data like
keyID, country
101 FR
101 IT
101 –
102 –
102 –
102 US
The Output should look like
101 FR,IT,-
102 -,-,US
can someone help me how to write the script in QlikView.
Hi, thanks for a great post. I just discovered it myself when trying to concatenate a table where I had used DISTINCT and found my desired field completely disappeared from the target table! However, I am not able to solve this through making a temporary table and then concatenating it, which I understand is your suggestion.
Excellent post! I couldn’t understand why duplicate records were getting dropped until after reading your article. Thank you so much!
Hi Every one, Thanks much for this awesome thread. really interesting.
by any chance distinct with Joins/concatinations behave differently in both Qv 11 and Qv12..
why because when we are migrating to Azure cloud where the Qv version is 12 and where as our local server is Qv11, we are getting slight variance in the values even thought the script and apps are same to same.
Please help us as we are stuck here from days. Thanks Much