Basic Transformation in SSIS


As you can imagine, the primary reason people use SSIS is to read the data from a source and write it to a destination after it’s potentially transformed. This tutorial walks you through a common scenario: you want to copy data from a Flat File Source to a SQL Server table without altering the data. This may be a simple example, but the examples will get much more complex in later Post.

Start the tutorial by going online to the website for this Tutorialand downloading the sample extract that contains zip code information about cities. The zip code extract was retrieved from public record data from the 1990 census and has been filtered to contain only Florida cities, in order to reduce download time. You’ll use this in the next tutorial as well, so it’s very important not to skip this first tutorial.

Open SQL Server Data Tools (SSDT) and select File ⇒ New ⇒ Project. Then select Integration Services Project as your project type. Type ProSSISCh08 as the project name, and accept the rest of the defaults (as shown below the screen shot). You can place the project anywhere on your computer; the default location is under the users\”Your User Name”\Documents\Visual Studio 2012\Projects folder. In this example, the solution is created in c:\ProSSIS\Data\Ch08, but you may use the default location if you so desire.

Transformation new project

The project will be created, and you’ll see a default Package.dtsx package file in the Solution Explorer. Right-click the file, select Rename, and rename the file ZipLoad.dtsx. If the package isn’t open yet, double-click it to open it in the package designer.

Creating Connections

Now that you have the package created, you need to add a connection that can be used across multiple packages. In the Solution Explorer, right-click the Connection Managers folder and select New Connection Manager. This opens the Add SSIS Connection Manager window.

Select the OLEDB connection and click Add, which opens the Configure OLEDB Connection Manager window. If this is the first time you are creating a connection, then the Data Connection list in the left window will be empty.

NOTE There are many ways to create the connection. For example, you could create it as you’re creating each source and destination. You can also use the new Source and Destination Assistants when building a Data Flow. Once you’re more experienced with the tool, you’ll find what works best for you.

Click the New button at the bottom of the window. Your first Connection Manager for this example will be to SQL Server, so select Native OLE DB\SQL Native Client 11.0 as the Provider. For the Server Name option, type the name of your SQL Server and enter the authentication mode that is necessary for you to read and write to the database, as shown in below the screen shot. If you are using a local instance of SQL Server, then you should be able to use Localhost as the server name and Windows authentication for the credentials. Lastly, select the AdventureWorks database. If you don’t have the AdventureWorks database, select any other available database on the server. You can optionally test the connection. Now click OK. You will now have a Data Source in the Data Source box that should be selected. Click OK. You will now see a Data Source under the Connection Manager folder in the Solution Explorer, and the same Data Source in the Connection Manager window in the ZipLoad Package you first created.

Connection manager

This new Data Source will start with “(project)”. The name of the connection automatically contains the server name and the database name. This is not a good naming convention because packages are almost always moved from server to server — for example, development, QA, and production servers. Therefore, a better naming convention would be the connection type and the database name. Right-click on the Connection Manager you just created in the Solution Explorer and rename it to OLEDB_AdventureWorks. You can create connections in the package and convert them to project connections also. Any project-level connections automatically appear in the Connection Manager window in all packages in that project. If a connection is needed only in one package in the project, then it can be created at the package level.

To create a connection at the package level, right-click in the Connection Manager window in a package and follow the same steps used to create the project-level connection described previously.

Next, create a Flat File connection and point it to the ZipCodeextract.csv file in your C:\ProSSIS\Data\Ch08 directory. Right-click in the Connection Manager area of the package designer, and select New Flat File Connection. Name the connection FF_ZipCode_CSV, and add any description you like. Point the File Name option to C:\ProSSIS\Data\Ch08\ZipCodeExtract.csv or browse to the correct location by clicking Browse.

NOTE If you can’t find the file, ensure that the file type filter is adjusted so you’re not looking for just *.txt files, which is the default setting. Set the filter to All Files to ensure you can see the file.

Set the Format dropdown box to Delimited, with <none> set for the Text Qualifier option; these are both the default options. The Text Qualifier option enables you to specify that character data is wrapped in quotes or some type of qualifier. This is helpful when you have a file that is delimited by commas and you also have commas inside some of the text data that you do not wish to separate by. Setting a Text Qualifier will ignore any commas inside the text data. Lastly, check the “Column names in the first data row” option. This specifies that your first row contains the column names for the file.

Select the Columns tab from the left side of the editor to see a preview of the file data and to set the row and column delimiters. The defaults are generally fine for this screen. The Row Delimiter option should be set to {CR}{LF}, which means that a carriage return and line feed separates each row. The Column Delimiter option is carried over from the first page and is therefore set to “Comma {,}”. In some extracts that you may receive, the header record may be different from the data records, and the configurations won’t be exactly the same as in the example.

Now select the Advanced tab from the left side of the editor. Here, you can specify the data types for each of the three columns. The default for this type of data is a 50-character string, which is excessive in this case. Click Suggest Types to comb through the data and find the best data type fit for it. This will open the Suggest Column Types dialog, where you should accept the default options and click OK.

At this point, the data types in the Advanced tab have changed for each column. One column in particular was incorrectly changed. When combing through the first 100 records, the Suggest Column Types dialog selected a “four-byte signed integer [DT_I4]” for the zip code column, but your Suggest Types button may select a smaller data type based on the data. While this would work for the data extract you currently have, it won’t work for states that have zip codes that begin with a zero in the northeast United States. Change this column to a string by selecting “string [DT_STR]” from the DataType dropdown, and change the length of the column to 5 by changing the OutputColumnWidth option (see the below the screen shot). Finally, change the TextQualified option to False, and then click OK.

the OutputColumnWidth option

Creating the Control Flow

With the first two connections created, you can now define the package’s Control Flow. In this tutorial, you have only a single task, the Data Flow Task. In the Toolbox, drag the Data Flow Task over to the design pane in the Control Flow tab. Next, right-click the task and select Rename to rename the task Load ZipCode Data.

Creating the Data Flow

This section reflects the most detailed portion of almost all of your packages.

The Data Flow is where you will generally spend 70 percent of your time as an SSIS developer. To begin, double-click the Data Flow Task to drill into the Data Flow.

NOTE When opening a task for editing, always double-click the icon on the left side of the task. Otherwise, you may start renaming the task instead of opening it. You can also right-click on the task and select Edit.

Double-clicking on the Data Flow Task will automatically take you to the Data Flow tab. Note that the name of the task — Load ZipCode Data — is displayed in the Data Flow Task dropdown. If you had more than one Data Flow Task, the names of each would appear as options in the dropdown.

Drag and drop a Source Assistant from the Toolbox onto the Data Flow design pane. Select Flat File in the left pane of the Source Assistant. Then select FF_ZipCode_CSV in the right pane and Click OK. Rename the source Florida ZipCode File in the Properties window.

NOTE All the rename instructions in these tutorials are optional, but they will keep you on the same page as the tutorials in this Tutorial. In a real-world situation, ensuring the package task names are descriptive will make your package self-documenting. This is due to the names of the tasks being logged by SSIS. 

Double-click on the source, and you will see it is pointing to the Connection Manager called FF_ZipCode_CSV. Select the Columns tab and note the columns that you’ll be outputting to the path. You’ve now configured the source, and you can click OK.

Next, drag and drop Destination Assistant onto the design pane. Select SQL Server in the left pane and OLEDB_AdventureWorks in the right pane in the Destination Assistant window and click OK. Rename the new destination AdventureWorks. Select the Florida ZipCode File Source and then connect the path (blue arrow) from the Florida ZipCode File Source to AdventureWorks. Double-click the destination, and AdventureWorks should already be selected in the Connection Manager dropdown box. For the Name of the Table or View option, click the New button next to the dropdown box. This is how you can create a table inside SSDT without having to go back to SQL Server Management Studio.

The default DDL for creating the table uses the destination’s name (AdventureWorks), and the data types may not be exactly what you’d like. You can edit this DDL, as shown here:

CREATE TABLE [AdventureWorks] (
[StateFIPCode] smallint,
[ZipCode] varchar(5),
[StateAbbr] varchar(2),
[City] varchar(16),
[Longitude] real,
[Latitude] real,
[Population] int,
[AllocationPercentage] real

However, suppose this won’t do for your picky DBA, who is concerned about performance. In this case, you should rename the table ZipCode (taking out the brackets) and change each column’s data type to a more suitable size and type, as shown in the ZipCode and StateAbbr columns (Ch08SQL.txt):

[StateFIPCode] smallint,
[ZipCode] char(5),
[StateAbbr] char(2),
[City] varchar(16),
[Longitude] real,
[Latitude] real,
[Population] int,
[AllocationPercentage] real

When you are done changing the DDL, click OK. The table name will be transposed into the Table dropdown box. Finally, select the Mapping tab to ensure that the inputs are mapped to the outputs correctly. SSIS attempts to map the columns based on name; in this case, because you just created the table with the same column names, it should be a direct match, as shown in below screen shoot.

OLE DB Destination Editor
After confirming that the mappings look like above screen shoot, click OK.

Completing the Package

With the basic framework of the package now constructed, you need to add one more task to the Control Flow tab to ensure that you can run this package multiple times. To do this, click the Control Flow tab and drag an Execute SQL task over to the design pane. Rename the task Purge ZipCode Table. Double-click the task and select OLEDB_AdventureWorks from the Connection dropdown. Finally, type the following query for the SQLStatement option (you can also click the ellipsis button and enter the query):


Click OK to complete the task configuration. To connect the task as a parent to the Load ZipCode Info Task, click the Purge ZipCode Table Task and drag the green arrow onto the Load ZipCode Info Task.

Saving the Package

Your first package is now complete. To save the package, click the Save icon in the top menu or select File ⇒ Save Selected Items. Note here that by clicking Save, you’re saving the .DTSX file to the project, but you have not saved it to the server yet. To do that, you have to deploy the project. Also, SSIS does not version control your packages independently. To version control your packages, you need to integrate a solution like Subversion (SVN) into SSIS.

Executing the Package

With the package complete, you can attempt to execute it by right-clicking on the package name in the solution explorer and selecting Execute Package. This is a good habit to get into when executing, because other methods, like using the green debug arrow at the top, can cause more to execute than just the package. The package will take a few moments to validate, and then it will execute.

You can see the progress under the Progress tab or in the Output window. In the Control Flow tab, the two tasks display a small yellow circle that begins to spin (in the top right of the task). If all goes well, they will change to green circles with a check. If both get green checks, then the package execution was successful. If your package failed, you can check the Output window to see why. The Output window should be open by default, but in case it isn’t, you can open it by clicking View ⇒ Output. You can also see a graphical version of the Output window in the Progress tab (it is called the Execution Results tab if your package stops). The Execution Results tab will always show the results from the latest package run in the current SSDT session.

You can go to the Data Flow tab, shown in below screen shot, to see how many records were copied over. Notice the number of records displayed in the path as SSIS moves from source to destination.

basic transformation

By default, when you execute a package, you are placed in debug mode. Changes you make in this mode are not made available until you run the package again, and you cannot add new tasks or enter some editors. To break out of this mode, click the square Stop icon or click Debug ⇒ Stop Debugging.

0 Responses on Basic Transformation in SSIS"

Leave a Message

Your email address will not be published. Required fields are marked *

Copy Rights Reserved © Mindmajix.com All rights reserved. Disclaimer.