DATA TRANSFORMATIONS IN FLAT FILES
Data Transformation is the process of modifying the existing data to a new data format. It can also involve filtering out or adding some specific values to the existing data set. QlikView can carry out data transformation after reading it to its memory and using many built in functions.
Original data have been rarely in a state such that it can be used directly in QlikView without any modification. QlikView is powerful enough to allow modification of data when the data sources are flat files such as Excel or text files.
Flat files are common data sources for QlikView documents. Since we can’t always control how the data is formatted and arranged in the flat files, we have plenty of options when it comes to transforming this data into something we can use in our QlikView application. Most of these transformations are executed in the File Wizard: Transform dialog (also referred to as the Transformation Wizard).
This chapter describes the modification of data through the various QlikView transformation functions that can be used with flat files such as Excel, CSV, TXT, XML, or HTML.
WHY DATA TRANSFORMATION?
Flat files are usually not perfectly formatted for use in QlikView; they may contain junk rows, columns, or fields; have blank fields; have repeating columns requiring unwrapping; require rows and columns to be rotated or transposed; or have table types considered to be of cross table or generic types. Fortunately, when connecting to a flat data file, the Transformation Wizard allows you to manipulate the data before loading it into your QlikView document.
This will allow you, as a developer, to finely process the source data and turn it into a clean design, while at the same time keeping an efficient script.
USING THE TRANSFORMATION WIZARD
The Transformation Wizard can be accessed from the Script Editor, when you are connected to a flat file such as a table file. Once you have selected the file, click on the Enable Transformation Step button to start the Transformation Wizard.
REMOVING GARBAGE FROM DATA FILES
Let’s start using the Transformation Wizard by opening an .xls file and removing some of the garbage (unnecessary) rows and columns from the data. Download the Hospital_General_Information_Chap3 .xls spreadsheet (this data is adapted from multiple healthcare and hospital spread sheets publicly available at http:// www.data.gov ):
Related Page: Learn SET Analysis Syntaxes, Examples In QlikView
Related Page: Learn SET Analysis Syntaxes, Examples In QlikView
USING THE FILL FUNCTION
The Fill function in QlikView is used to fill values from existing fields into a new field.
It allows you to fill in field data from data cells adjoining the target field. In this example, note row 18 (Oroville Hospital – Clinic) has a value TBD for the provider ID column. Assume that we know from the business requirements that all fields marked TBD are for hospitals that are child facilities of the main hospital facilities. Let’s remove any fields marked TBD and replace the value with the value of the field directly above it (in this case, the provider ID of the parent facility).
Related Page: How To Use Visual Cues And Alternate States In QlikView?
UNWRAPPING DATA FROM REPEATING COLUMNS
The Excel file we are using for this example has repeating columns (1 to 4 and 5 to 8). We want to transform this data by moving all the columns from 5 to 8 and appending them to their corresponding columns from 1 to 4. We will accomplish this by using the Unwrap feature of the Transformation Wizard:
Column Manipulation is a type of Data Transformation, in which a new column is populated with values from an existing column which meets certain criteria. The criteria can be an expression which is created as part of the Data Transformation step.
Now that we’ve unwrapped the columns, we can further manipulate columns if we want to. Select the Column tab in the Transformation Wizard. In this tab, you can move or copy data from one column to an existing column or a new column. In this example, we will copy the provider ID column (column 1) to a new column we will create using this wizard (column 5). Duplication of columns within QlikView may be useful in building expressions later when designing sheet objects. This may also be done in the script itself by loading the same column twice and renaming the second column with an alias.
TRANSFORMATION SCRIPT RESULTS
When you click on Finish in the Transformation Wizard, the script code that is generated appears in the Script Editor.
Note that it’s easier to let the Transformation Wizard generate this code, and if anything needs to be tweaked a bit, such as changing a row position or column label name, it can be done in the script itself.
The Rotating table in QlikView is similar to the column and the row transpose feature in Microsoft Excel but with some additional options. We can transpose columns in multiple directions and they give different result.
It can be useful in data modeling to rotate a table in any direction or transpose (swap) rows and columns. Let’s try it using the rotate function of the Transformation Wizard. This is very similar to the crosstable function of the Transformation Wizard, but doesn’t allow for aggregation of column data. From the Script Editor, click on the Table Files button and, in the Open Local Files window, navigate to your copy of the file US_Oil_Imports_Chap3. xls Excel file (this file is adapted from the publicly available file at http:// www.data.gov ), and click on Next. The following screenshot illustrates the data fields of this .xls file:
1. In the File Wizard: Type dialog box, choose the Selected_Imports $ tab in the Tables drop-down list (this will select the correct tab in the Excel file), then sets the Header Size value to 1 line (this will remove the table title in row 1). Then, choose Embedded Labels in the Labels drop-down list.
2. Click on Next, and then click on the Enable Transformation Step button. Select the Rotate tab, then click on the Left button to see the table rotate counter clockwise. Click on the Right and the table moves back to the original orientation. In this fashion, you can rotate the table to whichever orientation you think makes sense for your needs. Click on Undo to return the table to the original orientation.
3. With the table in original orientation, click on the Transpose button. The table rows and columns swap positions. This orientation may be useful and is close to the crosstable function discussed in the next section. Click on Next twice and then on Finish to insert your transposition code into the script if desired. The transposed table can also be viewed in the wizard Results table window, just before clicking on Finish. Note that the month dates have now swapped places with the Region and Source key codes. This can be used in limited situations, but crosstable functionality (see the next section) may be more useful, however.
WORKING WITH CROSS TABLES
Cross tables are a common table type that creates problems in aggregation and analysis when trying to use them in the QlikView data model. The US_Oil_Imports_Chap3.xls spreadsheet worked within the preceding rotation exercise as a cross table, because the dates are stored as dimensions and you can’t properly aggregate all month data in the present format. We can convert this table into a normal table with the crosstable function of the Transformation Wizard as follows:
1. In the Script Editor, click on the Table Files button and navigate to the US_Oil_Imports_Chap3. xls Excel file. Click on Next, and then choose the Selected_Imports $ tab in the Tables drop-down list (this will select the correct tab in the Excel file). Then set the Header Size to 1 line (this will remove the table title in row 1). Then, choose Embedded Labels in the Labels drop-down list, and then click on OK.
2. Click on Next twice, and then click on the Crosstable button. In the Crosstable dialog box, select 2 for the Qualifier Fields (since we have 2 rows describing the region dimensions). In the Attribute field, enter the Date (note the color coding of the rows and column areas), and in the Data field, enter Barrels. Click on OK.
3. The Results tab in the Transformation Wizard allows you to view the results of the crosstable function. Note that the dates are now rolled up in one dimension.
4. Click on Finish, and the code is inserted into the script pane of Script Editor. Note in the code the prefix line of the Load statement, named CrossTable.
CrossTable( Date, Barels, 2)
Load * From
(biff, embedded labels, header is 1 lines, table is [Selected_Imports$]);
WORKING WITH GENERIC TABLES
Generic tables are very common tables; also called standard tables. These tables have less organization and have many different dimensions and measure units. Each dimension or unit may have its own field. In this example, we will review a scripting solution to create separate tables of each dimension, and then combine the tables into one fact table:
Related Page: Layout Tips For Developers – QlikView Scripting
open Generic_Table_Loading_Chap3.qvw. This QlikView application is a prototype tool for tracking software defects (noting software defect groups, priorities, and stages of development). Open the Script Editor and note the inline table load that builds a generic table.
LOAD * INLINE [
Defect, Group, Priority, Stage
1, Reports, High, Queued
2, Data, Low, Reported
3, Data, High, Closed
4, Object, Medium, Rejected
5, Security, High, Reported
6, System, Medium, Rejected
7, Security, High, Closed
8, Object, Low, Queued
9, Data, Medium, Closed
10, Requirements, Medium, Verified
GENERIC LOAD Defect, ‘WorkStep_’ & Stage, ‘x’ RESIDENT DefectTable;
This result may be fine in many circumstances, but one very useful tip is to clean up these tables and concatenate them into one larger fact table. To do this, place this code in the script after the generic load (modify it for your own example):
FOR t = 0 to NoOfTables()
LOAD TableName( $( t)) as Tablename
WHERE WildMatch( TableName( $( t)),
FOR t=1 to FieldValueCount (‘Tablename’)
LET vTable = FieldValue (‘Tablename’, $ (t));
LEFT JOIN (DefectTable) LOAD * RESIDENT $ (vTable);
DROP TABLE $ (vTable);
DROP TABLE TableList;
WORKING WITH HIERARCHIES
In Relational database, you must have come across dimensions which has a different level of granularity. When one field/level is correlated or divided into its several other subfields/sublevel for detailed analysis of data, then together all these fields constitute a hierarchy.
Hierarchies are an important part of all business intelligence solutions, used to describe dimensions that naturally contain different levels of granularity. Some are simple and intuitive whereas others are complex and demand a lot of thinking to be modeled correctly. QlikView has its own way of dealing with hierarchies using the hierarchy keyword.
Related Page: How Debugging Works In QlikView Script Debugger
The hierarchy function of QlikView is powerful for creating parent-child relationships between two or more values. A quick example of a hierarchy may be: the Solar System is the parent of the child Sun and Planet, and Planet is the parent of the child Earth. With the hierarchy functions set, QlikView can easily use this data to create tree-view tables.
Let’s consider another example, using a hospital surgery department structure:
Get Updates on Tech posts, Interview & Certification questions and training schedules