“Code is read more often than it is written”
– common programming adage
As someone who often has to look at/modify QlikView script written by others, I have come to appreciate the value of readable script. Rather than taking an “if it works, it works” attitude, I always try to write my script with readability and consistency in mind. Somewhere down the line someone else (or worse, you) might have to decipher what a specific script is doing in order to modify or extend it.
Today I want to ask your opinion on the readability of a specific part of the QlikView script, indentation of aliases within LOAD statements. To this end, I’ve taken a piece of script from the QlikView System Monitor and formatted the aliases in three different ways. (note: I don’t agree with the absence of white space in functions and between operators, but have left that as is)
Have a look at the three options, and then please answer the two multiple choice questions below.
Option 1
Option 2
Option 3
You might wonder what the difference between option 2 and 3 is, but I’ll let you discover that for yourself. If you don’t see it, just pick whatever you found most readable.
I’d be very interested in hearing your motivation for picking one option over the other, especially if you answered Something else to the second question. If you’d like to elaborate, or have any other thoughts around this subject, I welcome you to share them in the comments section below.
23 Comments
Another thing I do is adding the commas in front i.e.
LOAD
@1 AS Field1
,@2 AS Field2
,@3 AS Field3
IMHO this makes it easier to find errors and to comment in / out some fields without having to bother about the commas.
i like commas first as well – my old habbit from SQL world 🙂 that way if you delete a column, you script will still work fine unless it is a very first column
Yeah, I am aware of the comma at the front of the line thing. Personally, I don’t really find it aesthetically pleasing (but that shouldn’t be a reason in itself), don’t really have a problem with missing comma’s and find the overhead of moving all the comma’s too much hassle (the code generated by a table wizard puts them at the end of the line).
To me, a comma means the end of a ‘statement’, so it should be on the same line as that statement. Also, it just shifts the problem from the last column to the first column, although it’s more likely that you’ll be adding extra fields at the end instead of extra fields at the start.
Most of this probably sounds arbitrary, my opinion is you should come up with some sort of convention for yourself and stick with it. And it works best if this convention is something that’s commonly used, as mixing two different conventions (for example, when you inherit work from another developer) is probably worse than no convention.
Hi Barry,
A lot of the code I work with I inherit, so mine is a right mix. Probably a hybrid of one and two. If I wanted to enforce the clean look of 3 I would probably add the (perhaps superfluous) line:
Type AS Type,
The other hot battle with code (particularly SQL code) is commas to the left or to the right of each line. Personally, mine go to the right.
Steve
i really like option 2 (3rd one looks confusing to me) and I try to do this formatting whenever I can. However it is easy to get this out of control in QV and it does not help that QV editor does not have any beautifier / formatter which are very common in SQL world. So it is normally a mix of option 1 and 2 – but i would prefer 2.
For the first column, I align it with the comma in front of the second column:
LOAD
c1
,c2
;
not :
LOAD
c1
,c2
;
I also like to align parentheses and type a closing one before i start writing the expressions inside them:
Also like to align statements and put ; on a separate line like that:
LOAD
c1
,c2
;
Yes, I also put the semicolon on a separate line, it makes it easier to modify/add a where-clause later and creates a natural white-space between statements (I use two additional returns between LOAD statements, with the exception of statements that are directly related to each other, such as JOINs or CONCATENATEs)
It’s interesting to see by the way that (so far) roughly 80% prefer option 2 or 3, but more than 50% say they use option 1 in their scripts.
I use Option3, but with commas and semikolons as first char, like Borys.
in SQL i also do
SELECT [TAB] @1
,[TAB] [TAB] @2
FROM [TAB] my.table
WHERE [TAB] @1 [TAB] [TAB] ‘myvalue’
AND [TAB] [TAB] @2 [TAB] ….
;
.. and so on that i have straight columns
i like it when code is readable without comments.
happy coding!
plex
oh on my comment the brackets went away… (between the Tabs there sould be greater/smaller things.. and so on.. )
I Use option 2
I Also Use the same statement as Steve for aligning fields without a alias, example type as type
Hi Barry,
Being a meticulous person by nature I think about that as well and find it hard not to adjust inherited scripts.
I am not a fan of having so much space between fields and aliases as it takes too much time to move your eyes from one point to another on the same line. The same applies for screens in various software packages (ERP and bespoke) where 90% uses your options 2/3. for example it is very easy to mix up a field and alias in lines 2 and 3 as they are too much separated. It is also hard to align it when there are long functions, etc.
In complex function and mathematical expressions I like to put a blank space between parameters/operator to increase readability.
I use a mix like in the following excerpt:
load
upper(@5) as AuthenticatedUser,
@1 as AuditserverStarted,
@2 as ActionTimeStamp,
SomeField,
date(@2) &’_’& time(@2, ‘hh’) &’_’& time(@2, ‘mm’) as DateHourKey,
Qty * Price as Amount,
if(index(@6, ‘\’) > 0, subfield(@6, ‘\’, 1), ‘Selection’) as Action,
textbetween(mid(filename(), 1, index(filename(), ‘.’) – 1) &’_’, ‘_’, ‘_’) as QVSClusterNodeA
Cheers
I see now it is not seen correctly as I also like indenting in nested functions/loops and load statmemtns like here:
load
…Field1 as Description1,
…Field2 as Description2
I want the fields on their own lines so I always put the LOAD statement on a line by itself by hitting enter before that first field. Then I highlight the entire field list and hit Shift-Tab 5 times, followed by a single tab. That gets all the fields lined up and indented to the first tab stop. And while the fields are still highlighted I right-click and comment them all out. I then just uncomment the ones I want to use (at least during initial development). It’s a little bit of a pain to do this every time but it’s second nature for me now. Too bad their isn’t a way to customize the layout the wizards use.
Back in the days of COBOL, we were taught to put the PIC(X) phrase in column 44, always.
Rebelling from that formality, I go with a hybrid most similar to your option 2, but rather than put the AS far enough to the right to allow for the longest input syntax, I’ll group them in sections so that the AS is not more than an inch or two away from the statement that it is modifying.
I do it like this;
Load
A,
B,
C,
Capitalize(subfield(c,’;’,1) as C,
Capitalize(subfield(c,’;’,1) as D,
Capitalize(subfield(c,’;’,1) as E
Capitalize(subfield(c,’;’,1) as F,
G,
G as H,
G as J,
G as K,
L,
M
From ……..
By posting my comment the tabs where deleted. Before every alias i use a tab, for the first alias 1 tab, by the second alias 2 tabs, thirt alias 3 tabs etc etc.
……So the
………..Script look
………………….like this
(Whitout the dots)
This is not visible in my comment, sorry.
I agree that option 3 is neater. I personally prefer to group related variables together (if order is not important), add new lines between expressions, and wrap my named variables in brackets.
Here’s an example:
LOAD
Autonumber(@1) as [%some ID],
Autonumber(@2) as [%someother ID],
@3 as [Something 1],
@4 as [Something 2],
Upper(@5) as [Something Else],
// Format date as following “MM-DD-YY_hh_mm”:
Date(@5, ‘MM-DD-YY’) & ‘_’ & Time(@6, ‘hh’) & ‘_’ & Time(@6, ‘mm’) as [Some Date],
Date(@4) as [Some Other Date],
// Format version information in a meaningful way, i.e. “Product Version #”:
Pick(
WildMatch(variable, ‘abc1’, ‘abc2’, ‘abc3’),
‘ABC version 1’,
‘ABC Version 2’,
‘ABC Version 3’
) as [Version]
Hi Barry,
Good topic!
I also put a comma first and semi colon on a new line
I use tabs after a comma.
Example:
LOAD
Field1
, Field2
, IF(Field1 = 1, ‘ABC’, ‘DEF’) AS Field3
RESIDENT table
WHERE
Field2 = 2014
;
I use Option 3 as I find it the easiest to follow. Unfortunately QlikView does not allow prefixing a field definition with its alias, eg. [Alias] = ColumnName which is what I like to do in SQL Server because I don’t code in Arabic ( read left-to-right) but, hey, Option 3 is the next best thing.
Howdy,
I use Option 1 although I did dabble for a while moving all the aliases to the right. However as I always emphasis white space between blocks of code I found it was making it more difficult to quickly identify an expression as there was too much white space (probably moving them too far).
For me personally indenting blocks of code, white space between code and commenting a lot makes code more readable.
Also comma at the end of the line obviously :O).
Hi Barry,
I’ll prefer a mixed code-structure depending on the kind of load-statement and the amount of fields and transformations. Often I try to write very compact code with several fields in one line – if the amount of fields are larger and/or the transformations more complex I use indentations and/or cluster the fields. The readability from option 2/3 isn’t good enough to make such more efforts.
For me it’s more important to have a better view on the chain of load-statements.
– Marcus
I’m in favour of aligning the aliases (option 2) and using leading commas. Both require a proper editor that can automatically align and that provides a column selection. Currently Notepad++ is the only half-capable editor out there with a Qlikview mode.
Option 3 I find very confusing because there’s nothing on the left to correspond to the unaliased fieldnames on the right, but I automatically look for it and get lost. In any case it seems unnecessary to mix aliased and unaliased fields like this (and certainly not positional and named fields…)
I like to use preceding loads to assign meaningful names to everything before doing any logic. It can be a good way to separate blocks of logic as well.
Load
AuthenticatedUser
, AuditServerStarted
, AuditMessage
, SomeField
, AnotherField
, Object
, Action
, SheetID
, SelValues
, QVSClustNodeAudit
, Timestamp
, Timestamp As ActionTimestamp
, DateHourKey
, Date2
, SessionDocName
, DocFolder
, DocFullPath
;
// QVS Cluster Node comes from the file base name
Load *
, SubField(_FileBaseName & '_', '_', 2) As QVSClustNodeAudit
;
// Parse AuditMessage
Load *
, SubField(AuditMessage, '\', -1) As Object
, If( Match(AuditMessage, '\')
, SubField(AuditMessage, '\', 1)
, 'Selection') As Action
, If( Match(AuditMessage, 'Activated sheet Document')
, SubField(AuditMessage, '\', 2)
) As SheetID
;
// Split DocFullPath into folder and document
// There should be a function for this...
Load *
, Capitalize(
SubField(DocFullPath, '\', -1)) As SessionDocName
, Left(DocFullPath
, Index(DocFullPath, '\', -1) // last path separator
- 1) As DocFolder
;
// Cast / convert / format
Load *
, Upper(_Username) As AuthenticatedUser
, Date(Floor(Timestamp)) As Date2
, Date(Timestamp, 'dd_MM_yyyy_hh_mm') As DateHourKey
;
Load
@1 As AuditServerStarted
, @2 As Timestamp
, @3 As DocFullPath
// @4 is not used?
, @5 As _Username
, @6 As AuditMessage
, @7 As SelValues
// ? mixing positional and named fields ?
, FileBaseName() As _FileBaseName
, Type
, SomeField
, AnotherField