Slowly Changing Dimensions SSIS Step by Step
Slowly Changing Dimensions
To process the data from granularity tables to main tables, we follow a mechanism called slowly changing dimensions type.
Eg: there is a customer table where it holds customer details. If there is any change, in SCD’s there should be a manipulation in the process.
Slowly changing dimensions [SCD’s]
Type -1 →
- New record inserts
- Old record update/modifies
Here in emp-history after type-1 performance is done the location of vinay is changed from the location HYD to USA (i.e., replaced).
Note: one customer only one location.
Type-2 (version mechanism)
- New records inserted with version ‘0’ (ZERO)
- Old records are inserted with increment version (i.e., either 1 or any)
Note: highest version indicates the current location of the customer.
Type-2 (status mechanism)
- New records inserted with status = “current”
- Old record a) inserted with status = “current”
b) modifies earlier records status to “expired”
Note: status=’current’ means the customer is in that location.
- New record inserted with start –date and end-date as ‘9999-12-31’
- Old records a)inserted with start–date and end-date as ‘9999-12-31’
b)modifies old record end-dt to new record start-date
Note: current location of customer identified with end-date=’9999-12-31’
For every customer previous/current locations maintained.
Note: if you want to increase the number of history rows those many columns should be added to the history tables, which is burden to the system.
In real time we use frequently type-2 (date mechanism) as history maintenance mechanism [see 66 page no for practical’s]
Maintenance cleanup task: It removes files left over from a maintenance plan.
Notify operator task: It sends an e-mail msg to any SQL server agent operator.
Rebuilt index task: It arranges or re-organizes data or the data index pages by rebuilding the indexes. This improves the performance of index scans and seeks.
Reorganize index task: It defragments and compacts clustered and non-clustered indexes on table and views.
Shrink database task: It reduces the disk space consumed by database and log files by removing empty data and log prices.
Update statics task: It updates statistics of the object if there are already collected.
Execute T-SQL statement task:
It execute T-SQL server database, command, queries…etc.
Go to execute T-SQL taskàrt clickàeditàexecute
T-SQL statement taskàconnectionàSRC (take any connection)
T-SQL statement: Use DB-MSBI;
Delete from EMP_NEW1;
Implementation of slowly changing dimensions (SCD) (through wizards):
In SSIS along with Type-1 and Type-2 we implement fixed attribute.
In this situation the important business information field taken as “fixed”. If there is any change for values in the column we can take appropriate actions like failing or ignoring the operation.
- Take OLEDB source.
- SCDàrt clickàeditànextà
Connection manager: DB_MSBI
Table or view: EMP_HIST
Specify the column EID as business key.
Set change type: fixed attribute
Select fail the transformation if changes are detected in a fixed attributeànextànextàfinish.
SCD Type-1: (changing attribute)
Like above with only below two changes
- Set dimensions column: party loc.
- Set change type: changing attribute.
SCD Type-2: (status mechanism)
Like above with only below two changes
Set change type: historical attribute
Select use a single column to show current and expired records.
Column to indicate current record: status
Value when current: current
Expired value: expired.
SCD Type-2: (date mechanism)
Select use start and end dates to identify current and expired records.
Start date column: start-date
End date column: end-date
Variable to set date values: system::container
SCD Type-2: (date mechanism manually):
In control flow task →
Rt click → edit
Cache mode: Full cache specify how to handle rows with no matching.
Entries: Redirect rows to no match output.
OLEDB connection manager: local host: DB-MSBI
Use table or view: EMP-HIST
Connect EID, ENAME, ELOC to HIST_table for validation click okàok.
rt click → Edit
Connection manager tab:
Component properties tab:
SQL command: update EMP-HIST set END-DT=? Where EID=? And END-DT is null.
Column names tab:
Connect EID to parameter-1, ST-DT to parameter-0 click ok → ok
OLEDB source: EMP-Daily
Lookup: rt clickàedit
Take OLEDB Destination as EMP-HIST and connect the fields except END-DT.
Data profiling task:
- Newly added in 2008.
- This task is helpful to profile the data before proceeding further.
- Generally it profiles the below information.
Go to data profiling taskàrt click → edit →
Profile request →
Profile type: candidate key profile request
Column length distribution profile request
Column null ratio profile request
Column pattern profile request
Column statistics profile request
Column value distribution profile request
Functional dependency profile request
Value inclusion profile request
Create ADO.Net connection for the source table
1. SSIS menu → New connection → ADO.Net → Add → Data connection: MSBI-DB
2. Data profile task on control flow
Rt click → edit
Destination type: file connection.
Destination: New connection → create file → c:\party-profile-request.
Click Quick profile →
ADO.Net connection: MSBI-DB
Table or view: party
Select the required options
Click ok → ok.
- Execute data profile task and observe the file.
- Start menu→program→SS 2008→Integration service →data profile viewers→open→specify file path (c:\party-profile-result) and monitor the analysis by selecting each attribute.
Note: In other ETL tools for profiling we need to go for 3rd party tools.
Enroll for Instructor Led Live Online MSBI Training