Generating random test data in QlikView with Mockaroo

By Barry Harmsen

Functions to create random data in QlikView


I often need random data. For example to use in an example file on this blog, to deliver a customer demo or just to test out something new. As I am probably not the only one with this need, here’s a short and sweet post on how to generate random data for use in QlikView.

I usually create random data by using variables that generate different types of random data. Some of these are shown in the image above. For dimension labels I use mapping tables containing, for example, first and last names or company names. Then I put together some load statements like the one shown below and voila, random data.


Random data load statement in QlikView


While this approach works well it can be quite a hassle, especially if you want to create dimension values that look plausible. You can imagine that I was pleasantly surprised when I caught a tweet by Qlik’s Michael Tarallo yesterday in which he mentioned an online tool for generating test data: Mockaroo.

Mockaroo - random test data generator

This tool lets you generate dozens of different data types, from random dates, string and numbers to cities, names and even shirt sizes. You can generate up to 100,000 rows of random data, which can be exported to many different formats, CSV being the most useful for QlikView. You can even save your schemas and retrieve them directly from Mockaroo in QlikView:


Random data from Mockaroo in QlikView


While the 100,000 row limit is on the low side when you want to test the performance implications of various approaches, this does seem to be a very useful tool for customer demo’s. Alternatively, you can use this tool to generate your dimension tables while using the ‘old’ approach to generate big fact tables. In any case, if you ever need some realistic test data then Mockaroo is definitely worth a look.

Some additional tips for nice, random data in QlikView:

  • You can use the NormInv() function in QlikView to generate random values that follow a normal distribution, in my opinion this makes for much nicer, more realistic charts.
  • You can use the Sin() or Cos() functions to create (relatively) realistic seasonal patterns in your data.

 

About The Author

Barry Harmsen

Hi there, I'm Barry and I'm a Business Intelligence Consultant at Bitmetric and based in the Netherlands. Originally from a background of 'traditional' Data Warehousing, Business Intelligence and Performance Management, for the past few years I have been specializing in QlikView and a more user-centric form of BI. I have done numerous QlikView implementations in many different roles and industries. In 2012 I co-authored the book QlikView 11 for Developers. You can follow me on Twitter at @meneerharmsen.

8 Comments

  • 1
    March 14, 2014 - 04:39 | Permalink

    Good catch. Creating a random sales amount is easy, but the hard part is creating random costs, payments, discounts that are related to the original sales amount and make enough sense to look like real data.

    Karl

    • 2
      March 14, 2014 - 20:14 | Permalink

      Hi Karl,

      I would generate the random sales amount and sales date in the first load and then use preceding load to generate random costs and payments that are in line with the sales amount. You can generate all of this in a big, flat table and split it out later if necessary.

      It’s true that getting everything to look real can be quite a challenge. One of the things I like to use is the NormInv() function instead of Rand(). This makes the data much more plausible. For example, many salespeople/products/employees/etc will be near average performers but there’s always a few that are stand-out performers (and underperformers). Using NormInv() makes your charts look much nicer too, instead of the flat, linear distribution that Rand() gives you.

      You can also use a sine or cosine function to generate a pattern that looks seasonal. First generate a table of 365 days that follow the sin/cos wave. Next, loop through that table and autogenerate the amount of records that was calculated for each day.

      Additionally, if this is a sales demo, I will add a few ‘discoveries’ to the data manually. Discovering these nuggets during a demo really makes for a much more compelling demo and better illustrates the value of QlikView.

      Kind regards,
      Barry

  • 3
    March 15, 2014 - 05:14 | Permalink

    Nice tip Barry. I have used QlikView hot key + Q + Q to generate random data in the past, but this tool seems worth looking at to generate random data for demo apps even for blog articles.

    Thanks for sharing!

    Shilpan

  • 4
    srini
    March 17, 2014 - 15:27 | Permalink

    thank you for great post. mackaroo is so awesome tool

  • 5
    Nils Teller
    March 18, 2014 - 14:53 | Permalink

    Thanks !

    Found another tool: http://www.generatedata.com/
    You can use it on your own server …

  • 6
    Felix Meadows
    February 2, 2016 - 22:54 | Permalink

    I use often http://www.yanDATAellan.com tool to generate my test databases; it is simple, rich of functionalities, and user-friendly. You could generate up to 10.000 rows of mock data in several file formats (CSV, Excel, SQL, JSON, HTML, and XML). Take a look!

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    Read previous post:
    QlikView hash functions and collisions

    I'm currently updating my materials for the upcoming Masters Summit for QlikView in Chicago, and thought I'd share a little...

    Close