Regular expressions (or “regex”) are a developer’s best friend, they offer an extremely powerful way to search for or replace patterns in strings of text.
This post assumes that you already know how what regular expressions are and how to construct them. If this is not the case, then you may want to read up a little bit here before proceeding. Trust me, if you regularly have to deal with validating and cleaning input data then regular expressions are well worth your time.
Unfortunately, QlikView does not (yet*) natively support regex, however, there is a way to use regular expressions in your QlikView load script: VBScript.
Using VBScript, we can create functions that add regex functionality to our load scripts. There is just one caveat; since VBScript functions are processed slower than QlikView’s built-in functions, you should always do thorough performance testing before deploying the application. (Of course, you should always test before deploying any application, I just want to stress the point.)
So, how can VBScript regex functions be used in a QlikView load script? The accompanying example qvw shows how it’s done. In the example application I have defined three VBScript functions that can be used to add regex functionality to your load scripts:
- RegExTest: test if (part of) an input string matches a defined pattern. This is a very useful function to perform validation, for example to check if an input field contains a valid email address.
- RegExReplace: replace (part of) an input string that matches the specified pattern with another string. This function can be used to reformat input data, for example to apply the same format to all telephone numbers.
- RegExFind: Extract (multiple parts of) a string that matches the defined pattern. When multiple matches are found a concatenated string is returned. This could be used to extract all email addresses from a piece of text, or to extract all external links from an html file.
Download the example qvw containing the functions and examples of how to apply them. If you have any questions then feel free to ask them in the comments section.
* Of course, the best option would be for QlikTech to include regular expressions as a built-in function. For those of you with access to Q-Force, please support the product enhancement request for regular expressions by going here.
13 Comments
Thanks a lot, very usefull!.
Greetings from Mexico!
So Nice!
what will be the regex for email extraction?
Great example, but where do I find the VB-script code in QlikView?
Ctrl + M
Issue:
In my test script load
This returns ‘Yes’
IF (RegExTest(’10:’, ‘^([01]?[0-9]|2[0-3]):’) = -1, ‘Yes’, ‘No’)
This returns ‘No’
IF (RegExTest(’10:5′, ‘^([01]?[0-9]|2[0-3]):\d’) = -1, ‘Yes’, ‘No’)
This returns ‘Yes’
IF (RegExTest(’10:g’, ‘^([01]?[0-9]|2[0-3]):g’) = -1, ‘Yes’, ‘No’)
I’m going round in circles trying to sort this because I’m expecting both to return ‘Yes’
Whenever I have a digit after the colon it returns ‘No’
Whenever I have a character after the colon, it returns ‘Yes’.
The pattern is alright because I can test it in RegularExpression_Example.qvw at https://community.qlik.com/docs/DOC-4587 and it returns a match. The only difference is that it uses a Sub call instead of a Function call.
Basically I’m trying to build a regex pattern to test for valid time format in my script load, so must use a Function call. Any ideas why this is failing?
Hi Kevin,
That’s what you get when you’re trying to solve a problem with regular expressions, suddenly you have two problems 😉
I am a bit pressed for time, so unable to completely go through the regex. Parsing time formats is discussed in many places though, for example here: http://stackoverflow.com/questions/7536755/regular-expression-for-matching-hhmm-time-format
Kind regards,
Barry
I just spent a lot of time trying to debug a regex replacement that looked like this:
RegExReplace([Code], ‘^([^ ]*)([ -]*)(.*)$’, ‘$3’) As [Code Description]
It turned out that QlikView was helpfully seeing the ‘$3’ as a potentially numeric quantity and turning it into just ‘3’ so my replacements would not work. As soon as I added some text around the replacement (e.g. ‘[$3]’) it worked, so the solution was to use Text() to force the replacement pattern to be left alone.
That is:
RegExReplace([Code], ‘^([^ ]*)([ -]*)(.*)$’, Text(‘$3’)) As [Code Description]
worked for me.
Hello,
I am running that but gives me this error:
Syntax error
Unexpected token: ‘)’, expected one of: ‘,’, ‘,’, ‘OPERATOR_PLUS’, ‘OPERATOR_MINUS’, ‘OPERATOR_MULTIPLICATION’, ‘OPERATOR_DIVISION’, ‘OPERATOR_STRING_CONCAT’, …
Cleandata:
LOAD
ID,
if(RegExTest([Phone number], ‘^\d{3}-\d{4}’>>>>>>)<<<<<< = -1, 'Yes', 'No') as [Input phone number is valid], // Test if the phone number consists of three digits followed by a hyphen and four digits
RegExReplace([Phone number], '^(\d{3})(\d{4})', '$1-$2') as [Cleaned phone number], // Insert a hyphen between the third and fourth digit, if it is not already there
RegExReplace([Customer name], '^^(\w+)\s(\w+)', '$2, $1') as [Cleaned customer name], // Reformat the customer name in "Lastname, Firstname" format
RegExFind([Birth date] , '\d{2}-\d{2}-\d{4}', ';') as [Cleaned birth date] // Extract the birth date (a string in the form of xx-xx-xxxx) from the unclean, raw data
RESIDENT Inputdata WHERE RegExTest(ID, '^\d{3}[A-Z]{3}')
Could you please help me on that
I don’t know if you still need an answer but that code doesn’t work because macro function being called expect a boolean variable as a last parameter.
Try RegExTest([Phone number], ‘^\d{3}-\d{4}’, ‘false’)
Is there a way to make this kind of macro functions in load script work on QlikView Server running the reload of the document through a task?
The document you provided won’t work if executed through a QMC task even if you allow macro execution in QMC settings panel.
[email protected]
You need to put ///$bnf off at the very start of your script. Your QV app is too modern so it doesn’t like the syntax in the load script;