In QlikView, the best practices for data modeling deals with maintaining a well-structured data model and suitable to enhance data processing and analysis. Data modeling is a process of organizing data from various data sources to a single design schema that helps to analyze the combined data. It is a collection of dimension and fact tables that are connected through common fields. We can slice the data quickly in many ways by having a proper data model.
In this blog, we will learn the following:
- Combining tables
- Creating a data interval
- Data cleansing
- Generic databases
- Linking information to field values
- Matching intervals to discrete data
- Semantic links
- Using mapping instead of joining
- Using QVD files for incremental load
1. Importance of Data modeling
QlikView performs a better way when the data model is well designed and structured. A good data model ensures the quick data process, evaluates expressions, and provides accurate results because the data is arranged properly so that system takes a logical way to load and evaluate the data values.
As we know, the data model consists of dimension and fact tables that are connected with the fields. There are often challenges in dealing with dimensions that contain multiple granularity levels and large data volumes or fact tables. The performance of the Data model depends on how data is loaded into the QlikView document and also depends on how data is structured to achieve the best way of the data model.
2. Best Practices for Data modeling in QlikView
In general, the data loaded into the QlikView document can be explained by the ETL process. But, it is essential to create a data model that enables efficient data handling methods in QlikView. The normalized star schema or Snowflake schema without any circular references where every entity is stored in separate tables.
A central fact table contains keys to the dimensions and the numbers used to calculate measures such as the number of sales, units, etc. The surrounding connected tables contain the dimensions with all the attributes such as product, category, suppliers, etc. Each table in the data model corresponds to either Load or Select statement.
The following are the Best practices for Data modeling in QlikView.
In QlikView, the Join and Keep prefixes are used to combine two tables that may or may not have common fields. The Join is a simple way to combine tables by using Join prefix in the script, that joins with the internal table or with another table or with an existing table. In QlikView, joins can be made in the script, producing logical tables.
Example for Join:
LOAD x, y, z from table1.csv;
Join LOAD x, v from table2.csv;
The resultant internal table has the fields x, y, z, and v. The no.of records may differ depending on the field values from the two tables.
The Keep prefix is developed to reduce the no. of cases where the explicit joins need to be used. The Keep prefix between Select or Load statement has the effect of reducing tables to the intersection of table data before they are stored in QlikView. The Join and Keep prefixes must always be preceded by one of the keywords Inner, Left, or Right.
If the prefix Inner used before Join, it specifies that the join between the two tables should be an inner join. The resultant table contains the combinations between two tables with complete data set from both sides. If the prefix Inner is used before the Keep, it defines the two tables should be deduced to their common convergence before stored in QlikView. The same way the Left and Right prefixes are used to combine two tables as per their functions.
Creating a data interval
In a few cases, time intervals don’t store explicitly with a beginning and an end. But, they are implied by only one field- the change timestamp. For example, when you have currency rates of multiple currencies. Each currency rate changes with a new conversion rate. Also, the table contains rows with no dates to the initial conversion rate, before the first change was made.
We can create a FromDate, and ToDate intervals using the script commands given below.
Let vBeginTime = Num('1/1/2020');
Let vEndTime = Num('1/3/2020');
Let vEpsilon = Pow(2,-27);
LOAD Currency, Rate,
Date(If(IsNum([Change Date]), [Change Date], $(#vBeginTime))) as FromDate
(ooxml, embedded labels, table is Sheet1);
Subscribe to our youtube channel to get new updates..!
LOAD Currency, Rate, FromDate,
Peek(FromDate) - $(#vEpsilon),
)) as ToDate
Order By Currency, FromDate Desc;
It updates the source table in the following manner.
This table can be used in comparison with an existing date using Intervalmatch methods.
When the data is loading from different tables, the field values denoting the same thing are not always named consistently. It lacks consistency and also hinders associations, so the problems need to be solved. This can be done elegantly by creating a mapping table for the comparison of field values.
When several tables listing countries are loading, we may find that the same country has several names, for example, U.S.A is listed as US, U.S, USA, United States. To avoid the occurrence of different records denoting the same country name in the concatenates table, create a table similar to that and load it as a mapping table. The script should be as follows:
Mapping LOAD x,y from MappingTable.txt
(ansi, txt, delimiter is ',', embedded
Map Country using CountryMap;
LOAD Country, City from CountryA.txt
(ansi, txt, delimiter is ',', embedded labels);
LOAD Country, City from CountryB.txt
(ansi, txt, delimiter is ',', embedded labels);
The mapping statements load in the file MappingTable.txt as a mapping table with the label CountryMap.
The generic database is a table in which the field names are stored as field values in one column, as the field values are stored in a second. These databases are used for the attributes of different objects. For example, a generic database contains two objects, a box and a ball. It has some attributes such as weight and color, are common for both the objects, other attributes like height, length, width is not so. It doesn’t matter whether a Load or Select statement is used to load the generic database.
Since many of the attributes are not relevant for a specific object so it is required to store the data giving each attribute a column of its own. On the other hand, it looks clumsy displaying it in a way that mixed lengths, weights, and colors.
Generic SELECT* from GenericTable;
Hierarchies are often used to represent things organization dimension in data. These kinds of hierarchies are generally stored in the adjacent node table, in a table where each record correlates to a node and has a field that contains a reference to the parent node.
Hierarchy (NodeID, ParentNodeID, Title, 'Manager') LOAD
FROM 'hierarchy.txt' (txt, codepage is 5212, embedded labels, delimiter is ',', msq);
In such a table, the node is stored on one record only but still has no.of children. The table may contain the additional fields defining attributes for nodes. The expanded node table is the common representation, where each level in the hierarchy is stored in a separate field. The hierarchy keyword can be used in the load script to transform an adjacent node table to expand.
The problem with the expanded nodes tables is that it finds it difficult to use the fields for searches or selections, prior knowledge is required about the level to search or select. The tale is a different representation that solves this problem, and such representation is also known as a bridge table. It contains one file for every child-ancestor relation found in the data. The “hierarchybelongsto” keyword can be used in the load script to transform an adjacent node to an ancestor table.
Linking information to field values
In a QlikView document, the information can be in the form of images, text, or files that can be associated with data. To link information, tables have to be created that contains what information file is to be linked to a particular field value, and such tables are known as information tables. The information table includes two columns, one column is for field names containing a list of values, and the other is headed by an arbitrary name that contains information.The statement for loading this table as an information table as follows:
Info LOAD Country, I from Flagsoecd.csv (ansi, txt, delimiter is ',', embedded labels);
When a list box or multi-box items are linked to the information is selected, an information icon occurs beside the name field to show the information is available. Using that icon, it enables to show the information or load an application file. It is possible to disable the icon on the Layout of the list box properties dialog.
Matching intervals to discrete data
The interval match prefix to a Select or Load statement is used to link the discrete numeric values to at least one numeric interval. It is done to maintain a good data model. The extended interval match is used for handling the problem of slowly changing the dimensions in the source data. The table to which the discrete data points will be matched should have at least two fields, i.e., start and end fields. The QlikView must read the discrete values before the function calls. If the intervals overlap in the table, they are linked with all corresponding or matching values.
Related Article: [QlikView IntervalMatch Function]
Semantic links are created by loading tables containing the relations between the objects. When clicking on a semantic link, a selection is made in other fields. The table must contain three or four columns. A semantic table must either contain the relation between field values of different fields or between field values of the same field. The Select or Load statement is used to load a semantic table must be preceded by the semantic qualifier that shows it is not a logical table.
Four columns are used in semantic links, the first column includes field value that has a relation with other fields, the second column consists of the name of the relation, the third column contains related field value. The fourth column contains names of the inverse relations. If three columns are used, the names of the inverse relations column are not required.
Using mapping instead of joining
Join creates large size tables, and the processing of the large tables may slow down the tool. So instead of Join, the Mapping function works more efficiently. A mapping table consists of two columns, one is the comparison field, and the other is the mapping value field. For example, we have a table of orders and need to know the country of the customer that is stored in the customer table.
To view the country of a customer, we need to map the table as shown in the below table.
The mapping table that we name MapCustomerIDtoCountry, can be defined in the below script.
Mapping LOAD CustomerID, Country From Customers ;
The next is to apply the mapping function, using ApplyMap function,
ApplyMap('MapCustomerIDtoCountry', CustomerID, null()) as Country
From Orders ;
ApplyMap function is used to define what to return when the value is not found in the mapping table.
The resultant table is given below:
[Related Article: QlikView Metadata]
Using QVD files for incremental load
Incremental load is defined as loading new or changed records from the database. The QVD files are recommended for better and fast loading of the incremental load. In most cases, it is possible to perform incremental load using QVD files. The basic process is given below:
Load new data from the database table (a slow process, but loading a limited no. of records.)
Load old data from the QVD file (It is a fast process, though it can load many records)
Create new QVD file.
Repeat the process for every table loaded
This process is repeated whenever the incremental load is to load from the database.
In QlikView, these are the best practices for Data modeling while developing the script. A good script results in a Good data model that provides better application performance. Try to apply these practices while developing the script. Hope you find relevant information.