These are the SSIS practicals in MSBI
Solution (collection of projects) –> Project (collection of packages) –>Package (A discrete unit of work for doing ETL operations & Administrative tasks)
Start àPrograms –> SQL server 2008 –> click BIDSà File –> New –> Project –> Select ‘integration services project’ in templates window –> Enter name –> click ok
Project Name: Test_Solution_Project
Location : E:/Test
By default there is no solution is presented, so we can create a solution at the project creation time.
–> Click view menu –> Solution explorer (It describes the projects, packages, data source views, data source information)
↓
[It is used to connect to the DB it can be reusable “ACROSS packages”]
Data source view:
It is the logical object for the physical collection of tables or view in data sources.
Connection manager:
For every connection we can take a name, name can be reusable within the program (or) package.
In case of flat file: Folder the file name taken as connection string for connection manager.
In case of relation, source: Server name and data base name taken as part of connection string in connection manager.
Various ways of package execution:
a) By pressing “F5”
b) Solution explorer –> packages –> Right click –> Execute package
c) Debug menu –> Start debugging
d) SSMS –> Integration services –> MSDB –> package –> rtclick –> run package
e) By using DTUTIL, DTUTIL Exec.facilities …etc
White –> Ready to execute
Yellow –> Running
Red –> Fail
Green –> Success
Grey –> disable
We observe this information in “Progress tab” or “Log Providers”
(a) Progress tab information: It describes how the package validated and executed step by step from starting to ending. Generally it displays
(b) Log Providers: discussed later in this book
Variables
It is the value which is changeable within the package. There are two types of variables they are
1) System defined variables: These variables hold system information.
These variables store under SYSTEM
Syntax:SYSTEM: : < variable Name>
Ex: SYSTEM : : PACKAGE NAME
SYSTEM: : Execution ID
2) User Defined Variable: These variables are created by the user only. These variables store under user name space
Syntax: USER : : i
USER : : Name Var
USER : : Temp_date ..etc
Navigation:SSIS Menuàvariablesà in the variables window click the top most left corner option to create new variable.
Name Scope Data Type Value
l Package Int 32 1
Variable scope: The extent we use the variable is called the scope of variable. There are 2 scopes
a) Package Level: Within the package anywhere we can use this variable
b) Task Level: Within the task only we can use the variable.
To move the data from source to destination and to perform intermediate operations this task is mandatory. Frequently used sources are flat file source and OLE DB source and frequently used destinations are flat file destination and OLE DB destination.
Real time modes of flat file sources:
123 VINAY 123 VINAY
456 SIVAM 25 KISHORE
Fixed width Delimited (separated or Variable text)
ID –> 3 chars ID à Max 10 char
Name –> 5 char Name à Max 30 char
Row Delimiter:- ‘|’(pipe)
Column Delimiter:- ,(comma)
Row Delimiter:- CRLF(enter char) [carriage return line or forward]
Column Delimiter:- No
Eg:- Moving data from file to file (comma delimited)
Sol:- 1. Take data flow task on control flow
2. Go to data flow task à drag and drop flat file source & destination and do the below settings.
INPUT: Party_3RC –> notepad
File name: Party_3RC
No of rows: 13
–> Connection Manager Name is reusable in the packages.
IS Project 1:- Microsoft visual studio.
FLAT FILE Source:- FLAT FILE Destination:-
Select –> rtclick –> Edit select –> rtclick –> Edit
Flat file connection manager flat file connection manager –>
Click New click new –> select Delimited –> click ok
Connection manager name: SRC Connection manager name: TGT
File name: Browse to the i/p folder File name: Browse to the o/p folder
Party_SRC.Txt file. and specify party_output.txt
Format: Delimited Format: Delimited
Header Row Delimited: {CR}-{LF} Header Row Delimited: {CR}-{LF}
Header Row to skip: 1
Column names in the first data row: Column names in the first data row:
Check it Check it
Click preview –> click ok –> ok Click preview –> click ok –> ok, Go to mappings –> check whether required target columns – -> ok.
After assigning both source and destination
OLE DB:- Object Linking and embedding Data Base
Eg:- Moving data from table to table from source data base
DB_MSBI to MSBI_DB destination data base
Navigation: open BIDS à take data flow task in control flow
OLEDB source —-> OLEDB Destination
OLEDB –> Object linking and embedding data base –> Universal provider to any data base or application (excel..etc)
OLEDB source –> Rt click –> edit –>
OLEDB connection manager –> New –> New
Provider: native OLEDB Native SQL server client 10.0
Server name: local host
Select Authentication: windows (or) SQL server.
Select database: DB_MSBI
Click test connection –> ok –> ok –> ok
Name of the table or view –> emp –>ok –> ok
OLEDB Destination –> Rtclick –> edit
OLEDB connection manager –> New –>New
Provider: native OLEDB native SQL server client 10.0
Server name: local host
Select Authentication: windows (or) SQL server.
Select database: DB_VINAY
Click test connection –> ok –> ok –>ok
Name of the table or view –> click new –> change table name –>ok
Go mapping –> connect required source column to required from one work sheet in another excel
Eg: moving data from flat file to Raw file.
Raw file contains binary information, so we are not able to read.
–> raw file destination –> edit –> connection manager –> rtclick
Access mode –> file name
File name –> file path
Write always –> create always
Columns –> select columns –> ok
Eg: loading data from XML file to table in SQL server database.
XML_SCR.XML
001_Vinay
Vinay
< /Student 1>
< Student 1>
002_Siva
Siva
< /Student 1>
XML source –> rtclick –> edit –> data access mode –> XML location
OLEDB dest –> Generate XSD –>Ok.
Note: XML schema definition must be specified (or) generated to the corresponding XML file.
Frequently Asked MSBI Interview Questions & Answers
These are intermediate operations performed between source and destination.
Eg: concatenation, addition, sorting, merging etc.
There are different data flow transformations provided for different operations.
It sorts the data in the specified order (ascending or descending)
It has some flexibility to do sorting on multiple columns (by giving sort order)
There is a feature called “remove rows with duplicates” which helps us to display unique rows in sorting by eliminating duplicates.
Eg: display the data by location in ascending order, with in the location names in descending order.
–> Rc –> EDIT Available input columns –> select columns
??
It performs operation row by row. It does different calculations, aggregation, concatenation, transformations, conversions etc for the columns in the rows.
Eg:
a) Display name and location by concatenation.
b) Display income, if it is null 99999.
c) Display income increment by 12%.
d) Display a new field with current date as business date.
e) Display default company code as 21000.
Note:
I. When we retrieve the data from flat file, all the columns belong to string data type (DT-STR).
II. When we retrieve the data from excel sheet, all the file it’s belong to (DT-WSTR).
Data conversions:
Data conversions are done in three ways.
a) By using data conversion transformation.
b) By using type cast operator in expression.
Syntax:
(column name).
(DT-I4) (PartyIncome)
(DT-DBDATE) (‘2010-10-10’)
c) Directly doing at flat file source itself.
Rt click on flat file source –> show advanced editor –> input and output properties –> flat file source O/P –> output columns –> party income –> data type properties –> data type →four-byte signed integer(DT-I4)
Aggregate Transformation:
It performs aggregate operations such as average, sum, count, min, max, count distinct, group by
Eg: display location wise income sum and average.
Flat file source addition properties:
àRetain null values from the source as null values. It helps us to treat nulls coming from the source as nulls only. (if we uncheck this option null treated as zero for integers, space for string)
Error output options:
In case of error or truncated values coming from source we can use either of the below options.
a) Ignore failure → in case of error or truncations it ignores the failure.
b) Redirect row → in case of error or truncation it redirect the row to the another destination.
c) Fail component → in case of error or truncation it simply fails the component.
Navigation:
Flat file source àrtclickàeditàerror O/P.
Flat file destination files format:
OLEDB Source properties:
OLEDB Source àrtclickàeditàdata
Access mode:
Note: we use variables for dynamic source retrieval.
Multicast Transformation:
It creates multiple copies of some source. So that instead of doing multiple operations on the same source in multiple package we can do in a single package by using multicast.
→ It improves the performance because it reads the data only one time from the source.
Package-1 package-2 package-3
EMP EMP EMP
Advantages of single package split
It Merges multiple input data source here the restriction is the sources should be the sorted order so that the output will also be in the sorted order.
There are two ways to implement
If the source are not in sorted order do the below process
Conditional split transformation:
a) It splits the data based on the condition.
b) There are two types of output comes from this transformation.
Conditions matched output.
Conditions unmatched output or default output.
Eg: move HYD, Bang, data to separate files and unmatched data to another file.
→ Conditional split → rt clickà
ORDER OUTPUTNAME CONDITION
1) HYD-DATA [PARTYLOC]=”HYD”
2) BLORE-DATA [PARTYLOC]=”BLORE”
Default output name → Unmatched-data → Ok
Union all transformation:
Limitations:
1) Input sources structures should be same. [No.of columns, order of data type of columns]
OLEDB Source Editor Properties
Data access mode:
Note: this variable generally recommended at the time of dynamic retrieval of data.
Data access mode:
i) Table or view
ii) Table or view-fast load → it loads the data very fast compared to normal view or table loading.
During the fast load there are couples of options we must select according to the situation.
Rows per batch 1000
Keep identify uses identity columns generated values.
Note: This fast load option is useful when the table is having clustered index.
iii) Table name or view name-fast load
iv) SQL command.
Question: In data accessing mode after selecting table or view option and clicking new button to create a table, then when does the table will be created.
Sol: After writing changes
Immediately table created (but not at execution time)
1) Write expression when they are small because too many expressions and complex expressions decrease the performance.
2) Generally we use expressions in various places.
As part of expressions there are many functions, typecasts and operators available.
Mathematical function: [Mathematical operations on numerical values]
ABS, CEILING, EXP, FLOOR, LOG, LN, POWER, ROUND, SIGN, SQUARE, SQRT.. etc.
Eg: Round (4.82) → 5 ABS (4.82) → 5 ceiling (4.82) → 5
Round (4.26) → 4 ABS (-3.92) →4 floor (4.82) → 4
String Function: [Manipulates string columns/expressions.]
LENGTH, LOWER, LTRIM, REPLACE, CODEPOINT, FINDSTRING, HEX, REPLICATE, REVERSE, RIGHT, SUBSTRING,….etc.
Eg: LOWER (“ABC”) →abc
LTRIM (“abc”) → abc
–> It removes the space.
TRIM(“abc”) →abc
REVERSE (“abc”) → cba
FIND STRING (“VINNNAY”,”N”, 3) → 5 [3rd position]
REPLICATE (“a”, 3) → aaa
REPLACE (“VINAY”, ”Na”, “NNa”)
→ VINNAY
SUBSTRING (“VINAY”, 2, 2) 2
2 point string length 2 → ”IN”
FIND STRING (“VINAY”, N) →
Date/Time function: [To work with day, month, year etc.. In date expressions]
DATE ADD, DATE DIFF, DATE PART, DAY, GETDATE, GETUTCDATE → (universal time coordinate), MONTH, YEAR
DAY (DT-DATE) “2011-09-04” –> 04
MONTH (DT-DATE) “2010-09-04” –> 09
DATE ADD (“MONTH”, 4, (DT-DATE) “2010-05-04”) –> 2010-09-04.
DATE ADD (“MONTH”, 4, JDATE)
DATE DIFF (“MONTH”, (DT-DATE)”2011-07-09”, (DT-DATE)”2011-04-09”)
03months
“DAY” → 90 days
“YEAR” → 0 years
Null functions: [validate null argument]
a) Is null (expression) –> result/true or false
b) Null (DT-DATE) –> null [useful to display “null date” value]
Type cast: [convert one data type of other data type.]
Eg: (DT-Numeric,2,6) (123456)à1234.56.
Operator:
&&———–logical AND
||————logical OR
?: –> writing if condition
?:
Eg: is null (party name)? “unknown:” TRIM (party name)
Display expression if date is null or the date length is zero or the date is having null date, display null otherwise display date.
IS NULL (JDATE)||len (trim((DT-WSTR,10)JDATE))==0||JDATE==”00-00-0000”? NULL(DT-DATE):JDATE
List: ADD, concatenates (+), subtract, Negate, Multiply, Divide, modulo (%), parentheses ([]), equal (==), Unequal (!=), Greater (>)..etc
FILENAME-YYYY-MM-DD-HHMMSS.TXT
CDATEà”2011-08-09 04:23:22:00000
“File Name-“+
Substring ((DT-WSTR, 30) CDATE, 1, 10) +”-“+
Substring ((DT-WSTR, 30) CDATE, 12, 2) +
Substring ((DT-WSTR, 30) CDATE, 15, 2) +
Substring ((DT-WSTR, 30) CDATE, 18, 2) +”.TXT
Note:
Fast load option is useful when the table is having clustered index.
In case sorted input sources:
0 –> Un sorted
1 –> Sorted
Output columns –> party ID –> data types: change DT-I4 sort key position:1
Flat file source error output –> is sorted: true
Output column –> flat file source error output column sort key option: 1 ok –>ok
Note: merge works with two source at a time
Different between merge and union all
Merge Union all
Only two sources No limit
Input should be sorted Not applicable
Sorted result Not applicable
Merge join:
It performs merge operation along with joins. Generally it supports the below joins.
a) Inner join
b) Left join
c) Full join
Emp table
DID DNAME NULL
10 IT ?
40 HR ?
Dept Table
EID ENAME DID
1 A 10
2 B 20
Query structure:
Select cols/* from cross join no where condition.
Inner join
ON
Left join
ON
Right join < Table B >ON
Full outer join
ON
Eg: select E.EID, E.ENAME, E.EID, D.DID, D.DNAME from EMP E
Cross join dept D
Inner join dept D ON E.EID=D.DID
Left outer join dept D ON E.EID=D.DID
Right outer join dept D ON E.EID=D.DID
Full outer join dept D ON E.EID=D.DID
Output:
Cache creation:
OLEDB source [Dept Table]
↑↓
Cache transformation –> RC –> edit –> cache connection manager new –> connection mgr name; Dept cache
Browse –> Output folder –> DBP _cache .CAW –> columns –> DEPT ID –> index position –> ok –> ok –> run(Package and see cache generate)
–> Merge join –> rt click –> merge join transformation editor –> join type: left outer join
Look up transformation:
It look up the required values on target and fetches relevant result. [Exact result]
Real time usage:
Types of caches:
No cache:
Here there is no cache to the target table. So every time source query hits the database and fetches the result.
Advantage:
If the source data changing frequently and less no.of records are there it is recommended.
Draw backs:
1. Hits on the target increases and traffic also high.
Full cache: Here there is a cache for the target table, so every source request goes to the cache and fetches the data.
Adv:
If the target is not changed and having more records.
Partial cache:
Initially the cache is empty for every new request source query hits the database and fetches the information to the cache. For existing records, source query hits the cache.
Adv:
More and more new records are added to the destination there is huge usage in existing records as well.
Look up result:
If there is no match in the look up we can go for either of the below (ways) options.
a) Ignore failure
b) Redirect rows to error output
c) Fail component
d) Redirect rows to no match O/P.
If the source is having multiple matches in the destination it returns “First Match”
Lookup performance improvement:
a) Increase or decrease the cache memory according to the target table size, because more rows with the big size cache gives bad performance.
b) Instead of taking table, take an SQL query to have required no.of rows and columns in the cache and to perform lookup operations.
Lookup –> rt click –> edit –> connection –> mark use results of an SQL query option and write this type of customized query.
“SELECT DEPT ID, DEPT NAME From DEPT where DEPT ID in [10,20]”
Eg: Retrieving the dept id, name from dept (using full cache) table based on the match from emp table.
Matched records in one destination and
Unmatched records to another destination.
Sol: a) OLEDB source (Emp table)
b) Lookup –> Rc –> Edit –> general select full cache
Specify how to handle rows with no matching entries i.e., redirect rows to no match output
Connection
OLEDB connection manager:
DB-MSBI
Use table or view:
Columns
Connect DID from EMP to DEPT and select DID, DName from Dept.
Ok
Take two destinations and connect matched result to one destination and unmatched result to another destination.
Note:
a) Unmatched result destination structure is like source table only and it contains source unmatched records.
b) Lookup operations we can perform only on relational tables (we can’t perform on flat files)
Working with partial cache:
In the above steps do the below changes
Navigation:
Lookup –> Rc –> Edit –>
General
Select partial cache.
Advanced
Take required amount of memory if required enable the cache for no matching entries options.
Note: the cache created and dropped automatically
Creating the named cache:
Navigation:
Start –> programs –> open BIDS –> New –>Project –> package –>control flow task –> data flow task –> Drag or select the OLEDB source –> cache transform –> steps page no 340)
Using the preached or named cache for operations:
To do the named cache operations, take some changes in the “lookup” i.e.,
Navigation: lookup –> rt click –> edit –> lookup editor
Properties –> general
Cache mode: Select full cache
Connection type: Cache connection manager
–> Specify how to handle rows with no matching entries Redirect rows to no match output.
Go to connection:
Cache connection manager –> new –> name –>
Check use cache file name
Ok
Ok
Check columns, ok, ok
Fuzzy lookup transformation:
This transformation is designed to get the result from the destination based on the similarity but not exact match.
While doing the operation three more columns added for destination.
a) similarity:
It displays how much similar source row with destination row. (Column)
b) similarity-column name:
It display how much similar each source column with each destination column.
c) confidence:
How much the system is having confidence to give the result?
Note: generally we go for similarities for string values.
Eg:
EMP – ADDRESS TABLE
EID ENAME
1 Madhu Mohan
2 Vinay Krishna
8 Rajesh Yadav
5 Ramana
6 Murali Mohan
Navigation:
Fuzzy lookup:
Reference table
OLEDB connection manager: DB-MSBI
Table or view: Emp-address
Columns:
Connect ename from Emp to Ename in Emp-address and select EName and E-address columns from Emp-add
Ok Ok.
4. Take two destinations and connect each condition.
5. Run package.
Pivot transformation:
Un pivoted column:
It doesn’t participate in pivoting.
Pivoted columns:
Columns values which are converted from rows to columns.
Pivoted value: the values which are moved to pivoted columns.
Column name Pivoted key usage
Party ID 1àun pivoted
SAL component 2àpivoted columns
SAL amount 3àpivoted values
Lineage ID:
It is a unique ID taken by a system for every column which is mapped.
Navigation:
Input columns: select all columns
Input output properties:
Pivot default input:
Do the below settings
Party ID –> pivot usage –> 1
SAL component –> pivot usage –> 2
SAL amount –> pivot usage –> 3
Note:
Identify lineage ids of party ID, sal component, sal amount
Eg:
Column name lineage id
Party id 349
Sal component 371
Sal amount 375
Pivot default O/P:
Audit:
It display audit information for every row coming from source or it adds audit information to the source data.
Eg: Audit types:
Character map:
It applies string operations such as lower to upper and vice versa etc.
Copy column:
It creates multiple copies of the column.
Export column:
It export column value from rows in data set to a file.
Eg: exporting, image from column to a file.
Import column:
It imports in to column, values from a file
Eg: loading images from file to table rows.
Fuzzy grouping:
It groups similar rows.
OLEDB command:
It executes an SQL command for each row in a data set.
Percentage sampling:
It takes sample percentage no.of rows from source data set.
Eg: taking 20% of sample rows from a dataset.
Row sampling:
It display the specified no.of sampled rows.
Eg: display 10,000 sample rows from a table.
Row count:
It count the no.of rows in dataset.
Un pivot:
It converts columns information iTerm lookup:nto rows i.e, it creates move a normalized representation of dataset.
Script component:
It executes a customs script (VB.Net) or (C#.Net)
Term extraction:
It extracts terms from a data to column.
It counts the frequencies that terms in a reference table appears in a dataset.
SSRS | Power BI |
SSAS | SQL Server |
SCCM | SQL Server DBA |
SharePoint | BizTalk Server |
Team Foundation Server | BizTalk Server Administrator |
Name | Dates | |
---|---|---|
MSBI Training | Sep 17 to Oct 02 | View Details |
MSBI Training | Sep 21 to Oct 06 | View Details |
MSBI Training | Sep 24 to Oct 09 | View Details |
MSBI Training | Sep 28 to Oct 13 | View Details |
As a content writer and storyteller, Raunaak Mitra regards himself to be a prodigy in writing. He firmly believes that language is borderless, and anyone can write as long as they have a strong narrative capability and the ability to emote feelings into words. Authors like F. Scott Fitzgerald and R. K. Narayan have influenced him to keep the writing as simple as possible for anyone to understand. Other than being well-versed about technological trends like AI, Machine Learning, AR, VR, Gaming, Microprocessors, Cloud Computing, Industry 4.0, literally any technological advancement (old or new), he also shares a knack to curate fiction on sci-fi, satire, and thriller genres.