Regular expressions in the load script

By Barry Harmsen

Regular expressions (or “regex”) are a developer’s best friend, they offer an extremely powerful way to search for or replace patterns in strings of text.

This post assumes that you already know how what regular expressions are and how to construct them. If this is not the case, then you may want to read up a little bit here before proceeding. Trust me, if you regularly have to deal with validating and cleaning input data then regular expressions are well worth your time.

Data validation and transformation using regular expressions, download the example application

Unfortunately, QlikView does not (yet*) natively support regex, however, there is a way to use regular expressions in your QlikView load script: VBScript.

Using VBScript, we can create functions that add regex functionality to our load scripts. There is just one caveat; since VBScript functions are processed slower than QlikView’s built-in functions, you should always do thorough performance testing before deploying the application. (Of course, you should always test before deploying any application, I just want to stress the point.)

So, how can VBScript regex functions be used in a QlikView load script? The accompanying example qvw shows how it’s done. In the example application I have defined three VBScript functions that can be used to add regex functionality to your load scripts:

  • RegExTest: test if (part of) an input string matches a defined pattern. This is a very useful function to perform validation, for example to check if an input field contains a valid email address.
  • RegExReplace: replace (part of) an input string that matches the specified pattern with another string. This function can be used to reformat input data, for example to apply the same format to all telephone numbers.
  • RegExFind: Extract (multiple parts of) a string that matches the defined pattern. When multiple matches are found a concatenated string is returned. This could be used to extract all email addresses from a piece of text, or to extract all external links from an html file.

Download the example qvw containing the functions and examples of how to apply them. If you  have any questions then feel free to ask them in the comments section.

* Of course, the best option would be for QlikTech to include regular expressions as a built-in function. For those of you with access to Q-Force, please support the product enhancement request for regular expressions by going here.

About The Author

Barry Harmsen

Hi there, I'm Barry and I'm a Business Intelligence Consultant at Bitmetric and based in the Netherlands. Originally from a background of 'traditional' Data Warehousing, Business Intelligence and Performance Management, for the past 10 years I have been specializing in Qlik and a more user-centric form of BI. I have done numerous QlikView and Qlik Sense implementations in many different roles and industries. In 2012 I co-authored the book QlikView 11 for Developers. You can follow me on Twitter at @meneerharmsen.

9 Comments

  • 1
    Joel Orta
    December 10, 2012 - 19:15 | Permalink

    Thanks a lot, very usefull!.

    Greetings from Mexico!

  • 2
    ET
    May 16, 2014 - 18:08 | Permalink

    So Nice!

    what will be the regex for email extraction?

  • 3
    Torstein Lunde
    September 27, 2015 - 05:13 | Permalink

    Great example, but where do I find the VB-script code in QlikView?

  • 5
    KevPickering
    October 6, 2016 - 15:09 | Permalink

    Issue:
    In my test script load
    This returns ‘Yes’
    IF (RegExTest(’10:’, ‘^([01]?[0-9]|2[0-3]):’) = -1, ‘Yes’, ‘No’)

    This returns ‘No’
    IF (RegExTest(’10:5′, ‘^([01]?[0-9]|2[0-3]):\d’) = -1, ‘Yes’, ‘No’)

    This returns ‘Yes’
    IF (RegExTest(’10:g’, ‘^([01]?[0-9]|2[0-3]):g’) = -1, ‘Yes’, ‘No’)

    I’m going round in circles trying to sort this because I’m expecting both to return ‘Yes’

    Whenever I have a digit after the colon it returns ‘No’

    Whenever I have a character after the colon, it returns ‘Yes’.

    The pattern is alright because I can test it in RegularExpression_Example.qvw at https://community.qlik.com/docs/DOC-4587 and it returns a match. The only difference is that it uses a Sub call instead of a Function call.

    Basically I’m trying to build a regex pattern to test for valid time format in my script load, so must use a Function call. Any ideas why this is failing?

  • 7
    William Rose
    October 11, 2016 - 17:15 | Permalink

    I just spent a lot of time trying to debug a regex replacement that looked like this:

    RegExReplace([Code], ‘^([^ ]*)([ -]*)(.*)$’, ‘$3’) As [Code Description]

    It turned out that QlikView was helpfully seeing the ‘$3’ as a potentially numeric quantity and turning it into just ‘3’ so my replacements would not work. As soon as I added some text around the replacement (e.g. ‘[$3]’) it worked, so the solution was to use Text() to force the replacement pattern to be left alone.

    That is:
    RegExReplace([Code], ‘^([^ ]*)([ -]*)(.*)$’, Text(‘$3’)) As [Code Description]

    worked for me.

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    Read previous post:
    Deleting script variables

    This post was inspired by a gentleman I met at a QlikTech Partner meeting who was adamant that a script...

    Close