SSIS Interview Questions
SSIS Interview Questions
Q. What is SSIS?
SSIS was first introduced with SQL Server 2005, which was the next generation of SQL Server software after SQL Server 2000. SSIS is an ETL tool that handles data Extraction, Transformation and load. In addition, it can handles more tasks besides the ETL tasks such as data profiling, file system manipulation etc…. If you know C#.net or VB.net, you can use the script task to perform much more efficient batch operations.
Q. What is SSIS control flow integration?
SSIS control flow allow you to program graphically how the tasks will run by using the logical connectors between tasks. there are three basic logical connectors that you can use: success, failure, or complete. Further more, you can use the FX (expression) to handle more complex conditions in your control flow.
Q. What is Data Transformation?
The data transformation stage applies a series of rules or functions to the extracted data from the source to derive the data for loading into the end target.
Q. What kind of variables can you create?
You can create global variables and task level variables in SSIS. For programmers, these variables are the same as global and function level variables. A global variable is available to all tasks across the entire job. Variables created in tasks are only available within that task.
Q. What kind of containers can you use with SSIS packages?
There are three types of containers: sequence, for loops and for each loops.
A sequence container is a simple way to group similar taks together. Think of a sequence container as an organization container for more complex SSIS packages.
A for loop container is what you can use to execute your tasks to a certain number of times. For example, you need to update records ten times, you can place the task that update the records inside this for loop container and specify 10 as the end of the loops. by using the for loop container, you don’t have to create neither ten different packages to do the same task, or have to run the whole package ten times when you schedule your job.
A for each loop container will be useful when you don’t know a head of time how many times a task should perform. for instance, let’s say that you want to delete all the files inside a folder, but you don’t know how many files are there at any particular time, by using for each loop, it can go through the collection of files and delete them for you, after the collection is emptied out, it knows that when it should stop.
Q. How to create the deployment utility?
Deployment is the process in which packages converts from development mode into executables mode. For deploying the SSIS package, you can directly deploy the package by right clicking Integration Services project and build it. This will save the package.dtsx file on the project\bin folder. Also, you can create the deployment utility using which the package can be deployed at either SQL Server or as a file on any location.
For creating deployment utility, follow these steps:
1. Right click on project and click on properites.
2. Select “True” for createDeploymentUtiltiy Option. Also, you can set the deployment path.
3. Now close the window after making the changes and build the project by right clicking on the project.
4. A deployment folder will be created in BIN folder of you main project location.
5. Inside the deployment folder, you will find .manifest file, double clicking on it you can get options to deploy package on SQL Server.
6. Log in to SQL Server and check in MSDB on Integration Services.
Q. What is Manifest file in SSIS?
Manifiest file is the utility which can be used to deploy the package using wizard on file system and SQL Server database.
Q. What is File system deployment?
File system deployment means to save pacakge file on local or network drive. Then you can use SQL Agent job to schedule when the packages will run.
Q. How to back up or retrive the SSIS packages?
If your package is deployed on SQL Server then you can back up the MSDB database as all the package on SQL server deploys at MSDB.
Q. What is data flow task in SSIS?
Data flow task is the pumping mechanism that moves data from source to destination. However, in the case of SSIS, you have much more control of what happens from start to finish. In fact, you have a set of out of the box transformation components that you snap together to clean and manipulate the data while it is in the data pipeline. Just as the control flow handles the main workflow of the package, the Data Flow handles the transformation of data. Almost anything that manipulates data falls into the Data Flow Category.
Q. What is data Profiling task?
Data profiling is the process of analyzing the source data to better understand what condition the data is in, in terms of cleanliness, patterns, numbers or nulls, and so on. data profiling task usually be used at the beggining of the development cycle to support the design of the destination database schema. Note that this task is not used when you develop the normal recurring ETL packages.
Q. What is the multicast Transformation in SSIS?
The Multicast transform, as the name implies, can send single data input to multiple output paths easily. You may want to use this transformation to send a path to multiple destinations sliced in different ways. The multicast transformation is similar to the Split Transformation because both send data to multiple outputs. However, you can not specify the conditions for which part of the data will be in which output in the Multicast transformation.
Q. Difference between Merge and Union All?
The Merge transformation can merge data from two paths into a single output. The Transform is usefull when you wish to break out your Data Flow into a path that handles certain errors and then merge it back into the main Data Flow downstream after the errors have been handled. it’s also useful if you wish to merge data from two Data Sources.
Note that the data must be sorted before using the Merge Transformation. you can do this by using the sort transformation prior to the merge or by specifying an ORDER By clause in the source connection. Also, the metadata must be the same for both paths. For example, CustomerID column cannot be a numeric column in one path and a character column in the other path.
The Union All Transformation works much the same way as the Merge Transformation, but it does not require the data to be sorted. It takes the outputs from multiple sources or transforms and combines them into a single result set.
Q. OLE DB Command Transform?
The OLE DB Command Transform is a component designed to execute a SQL Statement for each row in an input stream. This task is analogous to an ADO Command Object being created, prepared, and executed for each row of a result set. The input stream provides that data for parameters that can be set into the SQL Statement that is either an In – line statement or a stored procedure call.
Q. Execute package task?
The Execute Package Task enables you to build SSIS solutions called parent packages that execute other packages called “Child Packages”. You’ll find this capability an indispensable part of your SSIS development as your packages begin to grow.
Separating packages into discrete functional workflows makes for shorter development and testing cycles and facilitates best development practices. in SSIS, the child package is aware of the parenment package parameters and can reach out and access those parameters – level configuration values. The majority of the configuratble properties are in the Package tab of the Executable package Task Editor.
The first option is to provide the location of the child package. The options here are either File system and SQL Server. You can deploy SSIS package in the FIle system task as an .dtsx file, or within the msdb database of a SQL Server instance. if you select file system, you must first create a new connection manager connection to the child package. if the child package is located in a SQL Server, you’ll need to provide the OLE DB Connection Manager for the SQL Server that holds your packages. In either case, browse to and then select the child package within the connection to set the package to execute in the task.
Q. What is transaction in SSIS package and how to implement it?
Packages use transactions to bind the database actions that tasks perform into atomic units. and by doing this maintain data integrity. Al MS IS container types – packages the For loop, For each loop, and Sequence containers, and the task hosts that encapsulate each task can be configured to use transactions. IS provides three options for configuring transactions: Not supported, Supported, and Required.
Require indicates that the container start a transaction, unless one is already started by its parent container. if a transaction already exists, the containter joins the transaction For example, if a package that is not configured to support transactions includes a Sequence container that uses the Required option, the Sequence Container would start its own transaction. If the package were configured to use the Required option, the Sequence containter would join the package transaction.
Supported indicates that the container does not start a transaction, but joins any transaction started by its parent container. For example, if a package with four Executable SQL tasks starts a transaction and all four tasks use the Supported option, the database updates performed by the Execute SQL tasks are rolled back if any taks fails. if the package does not start a transaction, the four execute SQL tasks are not bound by a transaction, and no database updates except the ones performed by the failed task are rolled back
Not Supported indicates that the container does not start a transaction or join an existing transaction. A transaction started by a parent container does not affect child containers that have been configured to Not Support transactions. For instance, if a package is configured to start a transaction and a For Loop Container in the package uses the NotSupported option, none of the tasks in the For Loop can roll back if they fail.
Q. Difference between Execute TSQL Task and Execute SQL Task
In SSIS there is one tasks Execute TSQL task which is similar to Execute SQL task. Will see what is the difference between two.
Execute TSQL Task:
Pros: Takes less memory, faster perfomance
Cons: Output into variable not supported, Only supports ADO.net connection
Execute SQL Task:
Pros: Support output into variables and multiple types of connection, parameterized query possible.
Cons: Takes more memory, slower performance compared to TSQL task.
Q. Precedence Constraints
A task will only execute if the condition that is set by the precedence constraint preceding the task is met. By using these constraints, it will choose different execution paths depending on the success or failure of other tasks.
Success – Workflow will proceed when the preceding containter executes successfully. Indicated in control flow by a solid green line.
Failure – Workflow will proceed when the preceding container’s execution results in a failure. Indicated in control flow by a solid red line.
Completion – Workflow will proceed when the preceding container’s execution completes, regardless of success or failure. Indicated in control flow by a solid blue line.
Expression/Constraint with logical AND – workflow will proceed when specified expression and constraints evaluate to true. Indicated in control flow by a solid color line along with a small ‘fx’ icon next to it. Color of line depends on logical constraint chosen (e.g. success = green, completion = blue).
Q. What is the use of config file in SSIS?
Config file in SSIS is used to provide inputs to connection manager different properites which package use to assign values at run time dynamically.
Using config file user need to make changes to the config file which package will take automatically at run time because of using it you don’t need to every time make changes in the packages in case you are deploying package on multiple servers or locations. There are multiple ways in which configuration values can be stored.
XML configuration file: Store the config file as an XML file..
Environment variable Store the config in on of the environment variables.
Registry entry Store the confi in the registry
Parent package variable Store the config as a variable in the package that contains the tasks.
SQL Server Store the config in t a table in SQL Server
Q. Different between Control Flow and Data Flow?
a. Control flow is for designing the flow of the package. Data flow is for ETL process.
b. Data Flow is the subset of control flow
c. There will be only one control flow while multiple dataflow can exists.
d. Data flow cannot work without a control flow
All process base taks are part of control flow while ETL related tasks are the part of Dataflow which is again a subset of control flow.
Q. Check point?
Checkpoint is the property in SSIS which enables the project to restart from the point of failure. When we set the property to true package create the checkpoin file which stores the information about package execution and use to restart package from the point of failure. If the package runs successfully, the checkpoint file is deleted, and then re-created the next time the package runs.
Q. Can I run SSIS packages with SQL Server Express or Web or Workgroup editions?
I have looked at the SQL Server 2008 feature comparison matrix and it lists the express/web and workgroup editions as having the SSIS runtime. Does this mean it is possible to develop SSIS packages using the developer edition, and then deploy and run them on a server running one of the lowly SQL Server editions such as SQL Server 2008 Express edition?
A: You need dtexec to run SSIS packages from command line.
In SQL2005 Express you got dtexec and necessary tools with
- SQL Server 2005 Express Edition with Advanced Services
- Microsoft SQL Server 2005 Express Edition Toolkit
However, without SQL Agent you have to use other scheduling methods.
More information: http://daron.yondem.com/default.aspx?month=2006-05
Caveat: I haven’t actually tried this and there are reports that dtexec just fails with a licence error. In my case, I just needed to run a dtsx once and did it with Developer edition with SQLExpress as data flow destination.
Q. How can I manually fail a package in Integration Services?
I am running an Execute SQL Task statement in my SSIS package. The Execute SQL Task is running sql and checking that the tables have more than 1000 rows. If they have less than 1000 rows, I want to fail the package.
How do I force a fail inside of a SQL statement?
A: AFAIK, tasks in SSIS fail on error. So if your Execute SQL Task has a statment like so in it:
declare @count int
select @count = select count(*) from my_table
if @count < 1000
raiserror(‘Too few rows in my_table’,16,1)
— Process your table here
You should get the results you want.
Q. No Process Is on the Other End of the Pipe
I receive this error when I try to connect to SQL Server 2005. I have enabled TCP/IP, Named Pipes, and restarted the server but that is not working.
A: FYI, I’ve just had the same error.
I switched to Windows authentication, disconnected, then tried to login again with SQL authentication. This time I was told my password had expired. I changed the password and it all worked again.
Q. What is the SSIS package and what does it do?
Can someone tell me what the SSIS package is all about?
I am totally new to data-warehousing and I am going crazy with the terms SSIS, DTL, ETL and so on. I am looking for a big picture and how it fits in the overall .NET/SQL Server architecture.
A: SSIS (SQL Server Integration Services) is an upgrade of DTS (Data Transformation Services), which is a feature of the previous version of SQL Server. SSIS packages can be created in BIDS (Business Intelligence Development Studio). These can be used to merge data from heterogeneous data sources into SQL Server. They can also be used to populate data warehouses, to clean and standardize data, and to automate administrative tasks.
SQL Server Integration Services (SSIS) is a component of Microsoft SQL Server 2005. It replaces Data Transformation Services, which has been a feature of SQL Server since Version 7.0. Unlike DTS, which was included in all versions, SSIS is only available in the “Standard” and “Enterprise” editions. Integration Services provides a platform to build data integration and workflow applications. The primary use for SSIS is data warehousing as the product features a fast and flexible tool for data extraction, transformation, and loading (ETL).). The tool may also be used to automate maintenance of SQL Server databases, update multidimensional cube data, and perform other functions.
Q. Running SSIS packages in separate memory allocations or increasing the default buffer size?
I have a SSIS package that has a child package that is failing. The error message isn’t very helpful.
The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020
The problem seems to be I am running out of Virtual Memory to complete the job.
I found a forum thread that may help solve the problem.http://social.msdn.microsoft.com/forums/en-US/sqlintegrationservices/thread/d6d52157-0270-4200-a8c2-585fa9a0eed5/
From the solutions offered I am unsure though how to:
- increase default buffer size
- Allocate a child pacakage into it’s own memory allocation.
I am running the package daily in SQL Server 2005. I was running fine daily up until the 12th. I am assuming the dat file that we are using to import data into the database grew to a size that was to large for the database to handle. It’s only a 8.90MB csv file though. The import is a straight column to column import.
The problem child package is step 1 and fails and continues and successfully completes the next 8 steps.
A: How much memory is allocated to SQL Server? How much memory is allocated outside of the SQL Server process space?
The reason I ask is becuase SSIS memory is allocated from the memToLeave area of memory that sits outside of the SQL Server process space.
See here for details on configuring the amount of memory available to the memToLeave portion of memory.
For generic performance tuning of SSIS consult the following article.
I hope this makes sense but feel free to drop me a line once you have digested the material.
Q. How to schedule SSIS package to run as something other than SQL Agent Service Account
In SQLServer 2005, is it possible to schedule an SSIS package to run something other than the AQL Agent Service Account?
I’ve got an SSIS package that makes a connection to a db and runs a stored procedure. My criteria is that I will not specify user names/passwords in a package or package configuration, so I want to use integrated authentication.
The problem is that be default a step in a job runs as SQL Agent Service Account, and our server group does not want to grant that account execute rights on the stored procedures that my package will execute. So we’re trying to find a way to specify a different account in the step, so the package will run under a different context. is that possible?
A: If you have access to SQL Server Agent through SQL Server Management Studio, here are the steps to create a job using the Graphical User Interface. The steps show how to create an SQL job to run SSIS using SQL Agent Service Account and also how to create a proxy to run under a different using different credentials.
- Go to SQL Server Management Studio. ExpandSQL Server Agent and right-click on Jobs, then select New Job… as shown in screenshot #1.
- Provide a name and Owner by default will be the account that creates the job but you can change it according to your requirements. Assign a Category if you would like to and also provide a description. Refer screenshot #2.
- On the Steps section, click.. as shown in screenshot #3.
- On the New Job Step dialog, provide a Step name. SelectSQL Server Inegration Services Package from Type. This step will run under SQL Agent Service Account by default. Select the package source as File system and browse to the package path by clicking on ellipsis. This will populate the Package path. Refer screenshot #4. If you don’t want the step to execute under theSQL Agent Service Account, then refer the steps #8 – 9 to know how you can use a different account.
- If you have a SSIS configuration file (.dtsConfig) for the package, click on theConfigurationstab and add the Configuration file as shown in screenshot #5.
- Click OK and there is the package in step 1 as shown in screenshot #6. Similarly, you can create different steps.
- Once the job has been created, you can right-click on the job and selectScript Job as –> CREATE To –> New Query Editor Window to generate the script as shown in screenshot #7.
- To run the SSIS step under different account, on the Management Studio, navigate toSecurity –> right-click on Cedentials –> select New Credential… as shown in screenshot #8.
- On theNew Credential dialog, provide a Credential name, Windows account and Password under which you would like to execute SSIS steps in SQL jobs. Refer screenshot #9. Credential will be created as shown in screenshot #10.
- Next, we need to create a proxy. On the Management Studio, navigate toSQL Server Agent –> Proxies –> right-click on SSIS Package Execution –> select New Proxy… as shown in screenshot #11.
- On the New Proxy Account window, provide a Proxy name, select the newly created Credential, provide a description and select SQL Server Integration Services Package as shown in screenshot #12. Proxy account should be created as shown in screenshot #13.
- Now, if you go back to the step in SQL job, you should see the newly created Proxy account in theRun asdrop down. Refer screenshot #14.
Hope that helps.
Q. overhead of varchar(max) columns with small data
As part of a bulk load of data from an external source the stageing table is defined with varchar(max) columns. The idea being that each column will be able to hold whatever it finds in the source CSV file, and that we’ll validate the data (for type, size, percision etc) later.
But I’m concerned that the varchar(max) column has a lot of overhead for columns that have less than 200 characters. The fellow that designed this assures me this is best practice for ETL but I thought I would validate that assertion with the community.
A: VARCHAR(MAX) column values will be stored IN the table row, space permitting. So if you have a single VARCHAR(MAX) field and it’s 200, 300 byte, chances are it’ll be stored inline with the rest of your data. No problem or additional overhead here.
Only when the entire data of a single row cannot fit on a single SQL Server page (8K) anymore, only then will SQL Server move VARCHAR(MAX) data into overflow pages.
So all in all, I think you get the best of both worlds – inline storage when possible, overflow storage when necessary.
PS: As Mitch points out, this default behaviour can be turned off – I don’t see any compelling reasons to do so, however….
Q. ssis connection manager not storing sql password
I used to have a dts that had sql server authentication connection. Basically the userid password is stored in the package itself. Now when I go to SSIS the password is not getting stored to the package. I saw this when I google the problem. http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/c720e694-2f58-483a-9cd7-3feb7de2db7b but no one seem to have given a good resolution. Can you any of help please? Thanks in advance
A: That answer points to this article: http://support.microsoft.com/kb/918760
Here are the proposed solutions – have you evaluated them?
Method 1: Use a SQL Server Agent proxy account Create a SQL Server Agent proxy account. This proxy account must use a credential that lets SQL Server Agent run the job as the account that created the package or as an account that has the required permissions.
This method works to decrypt secrets and satisfies the key requirements by user. However, this method may have limited success because the SSIS package user keys involve the current user and the current computer. Therefore, if you move the package to another computer, this method may still fail, even if the job step uses the correct proxy account. Back to the top
Method 2: Set the SSIS Package ProtectionLevel property to ServerStorage Change the SSIS Package ProtectionLevel property to ServerStorage. This setting stores the package in a SQL Server database and allows access control through SQL Server database roles. Back to the top
Method 3: Set the SSIS Package ProtectionLevel property to EncryptSensitiveWithPassword Change the SSIS Package ProtectionLevel property to EncryptSensitiveWithPassword. This setting uses a password for encryption. You can then modify the SQL Server Agent job step command line to include this password.
Method 4: Use SSIS Package configuration files Use SSIS Package configuration files to store sensitive information, and then store these configuration files in a secured folder. You can then change the ProtectionLevel property to DontSaveSensitive so that the package is not encrypted and does not try to save secrets to the package. When you run the SSIS package, the required information is loaded from the configuration file. Make sure that the configuration files are adequately protected if they contain sensitive information.
Method 5: Create a package template For a long-term resolution, create a package template that uses a protection level that differs from the default setting. This problem will not occur in future packages.
Q. Advantages of using SSIS packages over stored procedures?
If I can do the required ETL requirements using stored procedures, any advantages of using SSIS packages instead? My ETL stuff is nothing major.
I feel like using an old technology. I like SQL. Old technology does not equal obsolete as stored procedures won’t go away any time soon.
A: If your ETL is mostly E and L, with very little T, and if you can write your SPs so they don’t rely on cursors, then going the SP-only route is probably fine.
For more complex processes, particularly those that involve heavy transforms, slowly changing dimensions, data mining lookups, etc, SSIS has three advantages.
First, it manages memory very efficiently, which can result in big performance improvements compared to T-SQL alone.
Second, the graphical interface lets you build large, complex and reliable transforms much more easily than hand-crafted T-SQL.
And third, SSIS lets you more easily interact with additional external sources, which can be very handy for things like data cleansing.
Q. How do you get an SSIS package to only insert new records when copying data between servers
I am copying some user data from one SqlServer to another. Call them Alpha and Beta. The SSIS package runs on Beta and it gets the rows on Alpha that meet a certain condition. The package then adds the rows to Beta’s table. Pretty simple and that works great.
The problem is that I only want to add new rows into Beta. Normally I would just do something simple like….
INSERT INTO BetaPeople
SELECT * From AlphaPeople
where ID NOT IN (SELECT ID FROM BetaPeople)
But this doesn’t work in an SSIS package. At least I don’t know how and that is the point of this question. How would one go about doing this across servers?
Your example seems simple, looks like you are adding only new people, not looking for changed data in existing records. In this case, store the last ID in the DB.
CREATE TABLE dbo.LAST (RW int, LastID Int)
INSERT INTO dbo.LAST (RW, LastID) VALUES (1,0)
Now you can use this to insert the last ID of the row transferred.
UPDATE dbo.LAST SET LastID = @myLastID WHERE RW = 1
When selecting OLEDB source, set data access mode to SQL Command and use
DECLARE @Last int
SET @Last = (SELECT LastID FROM dbo.LAST WHERE RW = 1)
SELECT * FROM AlphaPeople WHERE ID > @Last;
Note, I do assume that you are using ID int IDENTITY for your PK.
If you have to monitor for data changes of existing records, then have the “last changed” column in every table, and store time of the last transfer.
A different technique would involve setting-up a linked server on Beta to Alpha and running your example without using SSIS. I would expect this to be way slower and more resource intensive than the SSIS solution.
INSERT INTO dbo.BetaPeople
SELECT * FROM [Alpha].[myDB].[dbo].[AlphaPeople]
WHERE ID NOT IN (SELECT ID FROM dbo.BetaPeople)
Q. Does anyone have any good SSIS Best-Practices tips and suggestions?
I would interested to hear from other developer tip on SSIS best practices, dos and don’ts, pitfalls and tricks, etc.
I’m trying to introduce a few here at my company.
A: There is a 4000-character limit to the DT_WSTR (Unicode string) data type used by SSIS. Since expressions are converted to DT_WSTR, they also cannot exceed 4000 characters.
You’ll get an error if you exceed the limit at design time. However, you will not receive any obvious warnings or errors if you do so at run time. The operation will fail silently, and the expression may be evaluated incorrectly.
Note: this does not mean that strings or SSIS variables cannot exceed 4000 characters. The limitation only applies to expressions and other places that DT_WSTR is used. For a good explanation of this, take a look at:
Here’s an example that reproduces the issue:
- Create an SSIS string variable named testVariable.
- Assign the variable a long value in a script task (5000 characters, for instance). The assignment should succeed.
- Create a second task and connect the two tasks using a precedence constraint. Set the evaluation operation to “Expression” and set the expression to @testVariable != “”.
Even though the variable is not empty, the constraint will incorrectly evaluate to False because the expression exceeds 4000 characters. However, this will not cause any errors and the package will not fail.
If you notice strange behavior in your SSIS logic, take a look at your expressions and ensure that they don’t exceed 4000 characters when evaluated.
Q. SSIS SQL Task – “Parameter name is unrecognized”
I have a SQL Task that needs to run a simple update to update a single row.
I have set the SQLStatement to:
update agency set AgencyLastBatchSeqNo = ? where agencyID = ?
On the Parameter Mapping page I gave set Parameter 0 and Parameter 1 to variables that I know contain the right values. I have also set the Parameter Name values correctly.
In the database, the column AgencyLastBatchSeqNo is an int, AgencyID is a big int. Does anyone have a reference to find what the data types map to in SSIS? I have guessed at SHORT for the int and LONG for the big int.
When I run the task I get the following error:
[Execute SQL Task] Error: Executing the query “update agency set AgencyLastBatchSeqNo = ? where AgencyID = ?” failed with the following error: “Parameter name is unrecognized.”. Possible failure reasons: Problems with the query, “ResultSet” property not set correctly, parameters not set correctly, or connection not established correctly.
Could anyone please suggest what may be wrong?
A: The answer to this is to change the Parameter Name value in the Parameter Mapping screen.
Given the following query
SELECT Id, AnimalName FROM dbo.Farm WHERE Farm_id = ?
Assuming my Parameter is an integer Variable named User::Farm_id
Choose the following values on the Parameter Mapping Screen
Variable Name – User::Farm_id
Direction – Input
Data Type – LONG
Parameter Name – 0
Parameter Size – -1
Originally the Parameter Name will be “NewParameterName”. Simply change this to the ordinal position of your variable marker (“?”)
Q. Difference Between Persist Security Info And Integrated Security
Hi I know about Integrated Security =True/SSPI is same, but do not know about Persist Security=True Could you please explain Thanks
A: Persist Security = true means that the Password used for SQL authentication is not removed from the ConnectionString property of the connection.
When Integrated Security = true is used then the Persist Security is completely irelevant since it only applies to SQL authentication, not to windows/Integrated/SSPI.
Q. SSIS Script Component Write to Variable
This is with SQL 2005.
I have a script component inside a Data Flow Task. I would like to read from the input columns and write the data to a global user variable.
I’ve set my input columns and added my global user variable as a ReadWriteVariable to the script component properties.
Here is my code, I’m just trying to alter the value of the global user variable here, but its not working. When I write out the value of the variable in another task it still has its default value:
Skip code block
Public Class ScriptMain
Dim updateSQL As String
Public Sub Main()
Dim vars As IDTSVariables90
vars(“SQL_ATTR_Update”).Value = “Test”
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
‘updateSQL = Row.ITMID + Row.PRCCAT
I have also tried with no luck:
Me.ReadWriteVariables(“SQL_ATTR_Update”).Value = “Test”
A: I figured it out.
In Script component code, you use typed accessor properties to access certain package features such as variables and connection managers.
The PreExecute method can access only read-only variables. The PostExecute method can access both read-only and read/write variables.
For more information about these methods, see Coding and Debugging the Script Component.
It looks like Dts is only available in Script Task.
Here is what the code looks like:
Skip code block
Public Class ScriptMain
Dim updateSQL As String
Public Overrides Sub PostExecute()
Me.ReadWriteVariables(“SQL_ATTR_Update”).Value = “Test”
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
‘updateSQL = Row.ITMID + Row.PRCCAT
Q. SSIS – How to access a RecordSet variable inside a Script Task
How do you access a RecordSet variable inside a Script Task?
A: On the script tab, make sure you put the variable in either the readonlyvariables or readwritevariables text boxes.
Here is a simple script that I use to format the errors in a data flow (saved in a RecordSet Variable) into the body of an email. Basically I read the recordset varialbe into a datatable and process it row by row with the for loops. After this task completes I examine the value of uvErrorEmailNeeded to determine if there is anything to email using a conditional process flow connector. You will also need to add a reference to system.xml in your vb script. This is in SQL 2005.
Skip code block
Public Class ScriptMain
Public Sub Main()
Dim oleDA As New OleDbDataAdapter
Dim dt As New DataTable
Dim col As DataColumn
Dim row As DataRow
Dim sMsg As String
Dim sHeader As String
If dt.Rows.Count > 0 Then
Dts.Variables(“uvErrorEmailNeeded”).Value = True
For Each col In dt.Columns
sHeader = sHeader & col.ColumnName & vbTab
sHeader = sHeader & vbCrLf
For Each row In dt.Rows
For Each col In dt.Columns
sMsg = sMsg & row(col.Ordinal).ToString & vbTab
sMsg = sMsg & vbCrLf
Dts.Variables(“uvMessageBody”).Value = “Error task. Error list follows:” & vbCrLf & sHeader& sMsg & vbCrLf & vbCrLf
Dts.TaskResult = Dts.Results.Success
Q. SSIS 2008 Rows per batch and Maximum insert commit size
I’ve got about 100 million rows that I’m moving in SSIS 2008 via a Data Flow Task. It’s pretty much a straight table data copy using a Multicast. My question is this:
Using the OLE DB Destination Editor I have two options: Rows per batch and Maximum insert commit size. What are good settings for this? I’ve only been able to find that you are recommended to set Maximum insert commit size to 2147483647 instead of 0, but then tweak both these settings based on testing. I’m curious to find out if anyone has discovered anything useful in their own management of these values.
A: There is no best value, it depends greatly on the design of the database, the number of users, the kind of hardware you are operating one etc. THat is why you need to test for yourself with your system.
Q. Where are SSIS Packages Saved?
I right clicked on a Database in the object explorer of SQL Server 2008 Management Studio. I went to Tasks > Import Data, and imported some data from a flat text file, opting to save the package on the server.
Now how the heck do I get to the package to edit or run it again? Where in SQL Server Management Studio do I go? I’ve expanded everything and I can’t find it. It’s driving me nuts.
A: If you connect to the Integration Services instance on the server (different choice in the dropdown from “Database Engine” when you connect in SQL Server Management Studio), they’ll be under the MSDB folder under Stored Packages.
When you start management studio and connect to a database, make sure you have the server type set to Integration Services instead of Database Engine.
Q. selectively execute task in ssis control flow
I have a SSIS package with a control flow containing a bunch of execute sql tasks in a sequence.
I need to check a flag for each of the tasks and run the task if it is set, if not skip and go to the next one.
Each of the these task executes a stored proc. So i can check in the proc and “Return” if not set. I was looking for a “SSIS” solution if any.
A: Between your control flow tasks, click on the arrow and choose Edit. When you do this, you get a dialog that allows you to check the “constraint” (success, completion or failure) of the task, an “expression” (i.e. you can have your execute sql task return a value, store that value in a variable, and check the value of that variable in an expression to determine whether to continue down the path you are editing), an “expression and a constraint”, and an “expression or a constraint”. These last two are the same except for the logic. “Expression and constraint” requires a true condition on both the expression and the constraint, “expression or constraint” requires a true condition on only one of the expression and the constraint.
Q. SSIS PrimeOutput Error?
We have an SSIS job that has been running for over a year with no issue. The job takes a data set from a select statement in an oracle db and transfers the result to a table on a SQL Server 2005 instance.
As of this morning, we receive the following error message:
Error: 2010-05-26 05:06:47.71 Code: 0xC02090F5 Source: [job_name] DataReader Source  Description: The component “DataReader Source” (793) was unable to process the data. End Error Error: 2010-05-26 05:06:47.71 Code: 0xC0047038 Source: job_name Description: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component “DataReader Source” (793) returned error code 0xC02090F5. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure. End Error Error: 2010-05-26 05:06:47.71 Code: 0xC0047021 Source: P… The package execution fa… The step failed.
Does anyone know what a root cause might be?
A: There may be error messages posted before this with more information about the failure.
Did you look for other more specific error messages?
Are you logging errors or steps as they run? If so did you look in the logging table? If you aren’t logging in you SSIS package, I’d set that up as part of this fix, it will make it immeasurably earier to find the problem.
Things I would consider: schema changes, permissions changes, any recent software, operating systems updates to the servers involved, data mismatches (the first time perhaps that the Oracle table held data that couldn’t be inserted into the SQL table – check columns that don;t directly match first, string data that might get truncated, dates stored as strings that need to convert to datetime, etc.).
Q. Advantage of SSIS package over windows scheduled exe
I have an exe configured under windows scheduler to perform timely operations on a set of data.
The exe calls stored procs to retrieve data and perform some calcualtions and updates the data back to a different database.
I would like to know, what are the pros and cons of using SSIS package over scheduled exe.
A: Do you mean what are the pros and cons of using SQL Server Agent Jobs for scheduling running SSIS packages and command shell executions? I don’t really know the pros about windows scheduler, so I’ll stick to listing the pros of SQL Server Agent Jobs.
- If you are already using SQL Server Agent Jobs on your server, then running SSIS packages from the agent consolidates the places that you need to monitor to one location.
- SQL Server Agent Jobs have built in logging and notification features. I don’t know how Windows Scheduler performs in this area.
- SQL Server Agent Jobs can run more than just SSIS packages. So you may want to run a T-SQL command as step 1, retry if it fails, eventually move to step 2 if step 1 succeeds, or stop the job and send an error if the step 1 condition is never met. This is really useful for ETL processes where you are trying to monitor another server for some condition before running your ETL.
- SQL Server Agent Jobs are easy to report on since their data is stored in the msdb database. We have regualrly scheduled subscriptions for SSRS reports that provide us with data about our jobs. This means I can get an email each morning before I come into the office that tells me if everything is going well or if there are any problems that need to be tackled ASAP.
- SQL Server Agent Jobs are used by SSRS subscriptions for scheduling purposes. I commonly need to start SSRS reports by calling their job schedules, so I already have to work with SQL Server Agent Jobs.
- SQL Server Agent Jobs can be chained together. A common scenario for my ETL is to have several jobs run on a schedule in the morning. Once all the jobs succeed, another job is called that triggers several SQL Server Agent Jobs. Some jobs run in parallel and some run serially.
- SQL Server Agent Jobs are easy to script out and load into our source control system. This allows us to roll back to earlier versions of jobs if necessary. We’ve done this on a few occassions, particularly when someone deleted a job by accident.
On one ocassion we found a situation where Windows Scheduler was able to do something we couldn’t do with a SQL Server Agent Job. During the early days after a SAN migration we had some scripts for snapshotting and cloning drives that didn’t work in a SQL Server Agent Job. So we used a Windows Scheduler task to run the code for a while. After about a month, we figured out what we were missing and were able to move the step back to the SQL Server Agent Job.
Regarding SSIS over exe stored procedure calls.
- If all you are doing is running stored procedures, then SSIS may not add much for you. Both approaches work, so it really comes down to the differences between what you get from a .exe approach and SSIS as well as how many stored procedures that are being called.
- I prefer SSIS because we do so much on my team where we have to download data from other servers, import/export files, or do some crazy https posts. If we only had to run one set of processes and they were all stored procedure calls, then SSIS may have been overkill. For my environment, SSIS is the best tool for moving data because we move all kinds of types of data to and from the server. If you ever expect to move beyond running stored procedures, then it may make sense to adopt SSIS now.
- If you are just running a few stored procedures, then you could get away with doing this from the SQL Server Agent Job without SSIS. You can even parallelize jobs by making a master job start several jobs via msdb.dbo.sp_start_job ‘Job Name’.
- If you want to parallelize a lot of stored procedure calls, then SSIS will probably beat out chaining SQL Server Agent Job calls. Although chaining is possible in code, there’s no visual surface and it is harder to understand complex chaining scenarios that are easy to implement in SSIS with sequence containers and precedence constraints.
- From a code maintainability perspective, SSIS beats out any exe solution for my team since everyone on my team can understand SSIS and few of us can actually code outside of SSIS. If you are planning to transfer this to someone down the line, then you need to determine what is more maintainable for your environment. If you are building in an environment where your future replacement will be a .NET programmer and not a SQL DBA or Business Intelligence specialist, then SSIS may not be the appropriate code-base to pass on to a future programmer.
- SSIS gives you out of the box logging. Although you can certainly implement logging in code, you probably need to wrap everything in try-catch blocks and figure out some strategy for centralizing logging between executables. With SSIS, you can centralize logging to a SQL Server table, log files in some centralized folder, or use another log provider. Personally, I always log to the database and I have SSRS reports setup to help make sense of the data. We usually troubleshoot individual job failures based on the SQL Server Agent Job history step details. Logging from SSIS is more about understanding long-term failure patterns or monitoring warnings that don’t result in failures like removing data flow columns that are unused (early indicator for us of changes in the underlying source data structure) or performance metrics (although stored procedures also have a separate form of logging in our systems).
- SSIS give you a visual design surface. I mentioned this before briefly, but it is a point worth expanding upon on its own. BIDS is a decent design surface for understanding what’s running in what order. You won’t get this from writing do-while loops in code. Maybe you have some form of a visualizer that I’ve never used, but my experience with coding stored procedure calls always happened in a text editor, not in a visual design layer. SSIS makes it relatively easy to understand precedence and order of operations in the control flow which is where you would be working if you are using execute sql tasks.
- The deployment story for SSIS is pretty decent. We use BIDS Helper (a free add-in for BIDS), so deploying changes to packages is a right click away on the Solution Explorer. We only have to deploy one package at a time. If you are writing a master executable that runs all the ETL, then you probably have to compile the code and deploy it when none of the ETL is running. SSIS packages are modular code containers, so if you have 50 packages on your server and you make a change in one package, then you only have to deploy the one changed package. If you setup your executable to run code from configuration files and don’t have to recompile the whole application, then this may not be a major win.
- Testing changes to an individual package is probably generally easier than testing changes in an application. Meaning, if you change one ETL process in one part of your code, you may have to regression test (or unit test) your entire application. If you change one SSIS package, you can generally test it by running it in BIDS and then deploying it when you are comfortable with the changes.
- If you have to deploy all your changes through a release process and there are pre-release testing processes that you must pass, then an executable approach may be easier. I’ve never found an effective way to automatically unit test a SSIS package. I know there are frameworks and test harnesses for doing this, but I don’t have any experience with them so I can’t speak for the efficacy or ease of use. In all of my work with SSIS, I’ve always pushed the changes to our production server within minutes or seconds of writing the changes.
Let me know if you need me to elaborate on any points. Good luck!
Q. SSIS, outputting null as a column value within a script task
I have a synchronous transformation script component, with some input columns, and an extra defined Output column “ID”. ID outputs an integer (DT_I4) type. Under certain circumstances, I would like “ID” to output a null value from my script.
Assigning it a null value in other SSIS components works, I can use NULL(DT_I4) within a derived column for example, but I cannot figure out how to do this in a script.
Row.ID = null;
claims that ID is not a nullable value type.
I use c# within my scripts.
Is there any way to do this?
Edit: To make life easier, “ID” would ideally output a value of NULL by default if it is not assigned anywhere within the script using Row.ID = …;
A: Every column has a booelan property called ColumnName_IsNull (in your case should be Row.ID_IsNull). I think you should set it to true to set the columns value NULL
Q. How to create a temporary table in SSIS control flow task and then use it in data flow task?
I have a control flow where I create a temp database and table in a with a T-SQL Command. When I add a dataflow I would like to query the table but I can’t because the table doesn’t exist to grab information from. When I try I get errors about logging in because the database doesn’t exist (yet). I have delay validation to true.
If I create the database and table manually then add the dataflow with query and drop the database it sticks but it doesn’t seem like a clean solution.
If there is a better way to create a temporary staging database and query it in dataflows please let me know.
Set the property RetainSameConnection on the Connection Manager to True so that temporary table created in one Control Flow task can be retained in another task.
Here is a sample SSIS package written in SSIS 2008 R2 that illustrates using temporary tables.
Create a stored procedure that will create a temporary table named ##tmpStateProvince and populate with few records. The sample SSIS package will first call the stored procedure and then will fetch the temporary table data to populate the records into another database table. The sample package will use the database named Sora Use the below create stored procedure script.
Skip code block
CREATE PROCEDURE dbo.PopulateTempTable
SET NOCOUNT ON;
IF OBJECT_ID(‘TempDB..##tmpStateProvince’) IS NOT NULL
DROP TABLE ##tmpStateProvince;
CREATE TABLE ##tmpStateProvince
CountryCode nvarchar(3) NOT NULL
, StateCode nvarchar(3) NOT NULL
, Name nvarchar(30) NOT NULL
INSERT INTO ##tmpStateProvince
(CountryCode, StateCode, Name)
(‘CA’, ‘AB’, ‘Alberta’),
(‘US’, ‘CA’, ‘California’),
(‘DE’, ‘HH’, ‘Hamburg’),
(‘FR’, ’86’, ‘Vienne’),
(‘AU’, ‘SA’, ‘South Australia’),
(‘VI’, ‘VI’, ‘Virgin Islands’);
Create a table named dbo.StateProvince that will be used as the destination table to populate the records from temporary table. Use the below create table script to create the destination table.
Skip code block
CREATE TABLE dbo.StateProvince
StateProvinceID int IDENTITY(1,1) NOT NULL
, CountryCode nvarchar(3) NOT NULL
, StateCode nvarchar(3) NOT NULL
, Name nvarchar(30) NOT NULL
CONSTRAINT [PK_StateProvinceID] PRIMARY KEY CLUSTERED
) ON [PRIMARY];
Create an SSIS package using Business Intelligence Development Studio (BIDS). Right-click on the Connection Managers tab at the bottom of the package and click New OLE DB Connection… to create a new connection to access SQL Server 2008 R2 database.
Click New… on Configure OLE DB Connection Manager.
Perform the following actions on the Connection Manager dialog.
- SelectNative OLE DB\SQL Server Native Client 10.0 from Provider since the package will connect to SQL Server 2008 R2 database
- Enter theServer name, like MACHINENAME\INSTANCE
- SelectUse Windows Authentication from Log on to the server section or whichever you prefer.
- Select the database fromSelect or enter a database name, the sample uses the database name Sora.
- ClickTest Connection
- ClickOK on the Test connection succeeded
- ClickOK on Connection Manager
The newly created data connection will appear on Configure OLE DB Connection Manager. ClickOK.
OLE DB connection manager KIWI\SQLSERVER2008R2.Sora will appear under the Connection Manager tab at the bottom of the package. Right-click the connection manager and click Properties
Set the property RetainSameConnection on the connection KIWI\SQLSERVER2008R2.Sora to the value True.
Right-click anywhere inside the package and then click Variables to view the variables pane. Create the following variables.
- A new variable namedPopulateTempTable of data type String in the package scopeSO_5631010 and set the variable with the value EXEC dbo.PopulateTempTable.
- A new variable namedFetchTempData of data type String in the package scope SO_5631010and set the variable with the value SELECT CountryCode, StateCode, Name FROM ##tmpStateProvince
Drag and drop an Execute SQL Task on to the Control Flow tab. Double-click the Execute SQL Task to view the Execute SQL Task Editor.
On the General page of the Execute SQL Task Editor, perform the following actions.
- Set theName to Create and populate temp table
- Set theConnection Type to OLE DB
- Set theConnection to KIWI\SQLSERVER2008R2.Sora
- SelectVariable from SQLSourceType
- SelectUser::PopulateTempTable from SourceVariable
Drag and drop a Data Flow Task onto the Control Flow tab. Rename the Data Flow Task asTransfer temp data to database table. Connect the green arrow from the Execute SQL Task to the Data Flow Task.
Double-click the Data Flow Task to switch to Data Flow tab. Drag and drop an OLE DB Sourceonto the Data Flow tab. Double-click OLE DB Source to view the OLE DB Source Editor.
On the Connection Manager page of the OLE DB Source Editor, perform the following actions.
- SelectKIWI\SQLSERVER2008R2.Sora from OLE DB Connection Manager
- SelectSQL command from variable from Data access mode
- SelectUser::FetchTempData from Variable name
- ClickColumns page
Clicking Columns page on OLE DB Source Editor will display the following error because the table##tmpStateProvince specified in the source command variable does not exist and SSIS is unable to read the column definition.
To fix the error, execute the statement EXEC dbo.PopulateTempTable using SQL Server Management Studio (SSMS) on the database Sora so that the stored procedure will create the temporary table. After executing the stored procedure, click Columns page on OLE DB Source Editor, you will see the column information. Click OK.
Drag and drop OLE DB Destination onto the Data Flow tab. Connect the green arrow from OLE DB Source to OLE DB Destination. Double-click OLE DB Destination to open OLE DB Destination Editor.
On the Connection Manager page of the OLE DB Destination Editor, perform the following actions.
- SelectKIWI\SQLSERVER2008R2.Sora from OLE DB Connection Manager
- SelectTable or view – fast load from Data access mode
- Select[dbo].[StateProvince] from Name of the table or the view
- ClickMappings page
Click Mappings page on the OLE DB Destination Editor would automatically map the columns if the input and output column names are same. Click OK. Column StateProvinceID does not have a matching input column and it is defined as an IDENTITY column in database. Hence, no mapping is required.
Data Flow tab should look something like this after configuring all the components.
Click the OLE DB Source on Data Flow tab and press F4 to view Properties. Set the propertyValidateExternalMetadata to False so that SSIS would not try to check for the existence of the temporary table during validation phase of the package execution.
Execute the query select * from dbo.StateProvince in the SQL Server Management Studio (SSMS) to find the number of rows in the table. It should be empty before executing the package.
Execute the package. Control Flow shows successful execution.
In Data Flow tab, you will notice that the package successfully processed 6 rows. The stored procedure created early in this posted inserted 6 rows into the temporary table.
Execute the query select * from dbo.StateProvince in the SQL Server Management Studio (SSMS) to find the 6 rows successfully inserted into the table. The data should match with rows founds in the stored procedure.
The above example illustrated how to create and use temporary table within a package.
Q. How to check the SSIS package job results after it has completed its execution?
I have an SSIS package which imports the data into the SQL Server 2008 database. I have set up the schedule job in the SQL Server Agent to run that package. When I check the history, I could only see whether the job ran successfully or not. I could not see other messages apart from that.
I would like to know how many records are imported whenever the job is executed. How can I monitor that? Should I use the additional components in SSIS package or set some configurations in SQL Server Agent Job Setup?
I found some logging facilities in SQL Server Agent Job Setup but I am not sure it can fulfill my requirements or not.
A: If you are just interested in knowing the columns being processed and not interested with the info for further use, one possible option is making use of the SSIS logging feature. Here is how it works for data flow tasks.
- Click on the SSIS package.
- On the menus, selectSSIS –> Logging…
- On the ConfigureSSIS Logs: dialog, select the provider type and click Add. I have chosen SQL Server for this example. Check the Name checkbox and provide the data source under Configuration column. Here SQLServer is the name of the connection manager. SSIS will create a table named sysssislog and stored procedure dbo.sp_ssis_addlogentry in the database that you selected. Refer screenshot #1 below.
- If you need the rows processed, select the checkbox OnInformation. Here in the example, the package executed successfully so the log records were found under OnInformation. You may need to fine tune this event selection according to your requirements. Refer screenshot#2
- Here is a sample package execution within data flow task. Refer screenshot#3
- Here is a sample output of the log table dbo.sysssislog. I have only displayed the columnsid andmessage. There are many other columns in the table. In the query, I am filtering the output only for the package named ‘Package1‘ and the event ‘OnInformation‘. You can notice that records with ids 7, 14 and 15 contain the rows processed. Refer screenshot #4
Hope that helps.