How to Create SSIS Package in SQL Server 2014
THE SSIS PACKAGE DESIGNER
The SSIS Package Designer contains the design panes that you use to create an SSIS package. This tool contains all the items you need to move data or create a workflow with minimal or no code. The great thing about SSIS is that it is like programming with building blocks. The Package Designer contains five tabs: Control Flow, Data Flow, Parameters, Event Handlers, and Package Explorer. One additional tab, Progress, also appears when you execute packages. This Progress tab is renamed to Execution Results after the package stops running and you click Stop.
This chapter focuses on exploring the Control Flow tab, and the bulk of the next chapter dives into details about this tab. In SSIS, the Control Flow contains tasks and the workflow of the package and is separated from most of the movement of data in the Data Flow tab. This usability feature gives you greater control when creating and editing packages. The task that binds the Control Flow and Data Flow together is the Data Flow Task, which you study in depth over the next two chapters.
Understanding the difference between the Control Flow and Data Flow tabs represents one of the largest learning curves for a new SSIS developer. The easiest way to keep them straight is to think of the Data Flow tab as simply a way to configure the Data Flow Task. This separation gives you a huge amount of power when configuring the task. The other way to differentiate the two tabs is that the Control Flow tab handles the workflow of the package and ties the tasks together, whereas the Data Flow tab handles a data load.
The Control Flow contains the workflow parts of the package, which include the tasks, containers, and precedence constraints. SSIS has introduced the new concept of containers, which was briefly discussed in Chapter 1 and is covered in detail in Chapter 6. In the Control Flow tab, you can click and drag a task from the SSIS Toolbox into the Control Flow design pane. Once you have a task created, you can double-click the task to configure it. Until the task is configured, you may see a yellow warning or red error indicator on the task.
After you configure the task, you can link it to other tasks by using precedence constraints. Once you click the task, you’ll notice a green arrow pointing down from it, as shown in below diagram.
For example, to create an On Success precedence constraint, click the green arrow coming out of the task and drag it to the task you wish to link to the first task. In below diagram, you can see the On Success precedence constraint between an Execute Process task called EPR – Run Notepad and an Execute Process task called EPR – Run Calc.
In the Data Flow tab, when you click a source or a transformation, you’ll also see a blue and red arrow pointing down, enabling you to quickly direct your good (blue) or bad (red) data to a separate output. Therefore, if you run a formula that returns an error in the Data Flow, that single row could be outputted to a different table, and then all other rows could continue down the proper path.
In the Control Flow, though, you need to use a different approach. If you want the next task to execute only if the first task has failed, create a precedence constraint as shown earlier for the On Success constraint. After the constraint is created, double-click the constraint arrow. You’ll be taken to the Precedence Constraint Editor (see below diagram).
Use this editor to set what type of constraint you’ll be using. Three options are available in the Value dropdown field: Success, Failure, or Completion. A success constraint will fire the second task or container only when the first task or container succeeds. A failure constraint executes the second task or container only when the first task fails. The last scenario is a completion constraint, which will execute the second task or container whether the first task succeeds or fails.
In SSIS, you have the option to add a logical AND or OR when a task has multiple constraints. In the Precedence Constraint Editor in SSIS, you can configure the task to execute only if the group of predecessor tasks has completed (AND), or if any one of the predecessor tasks has completed (OR).
If a constraint is a logical AND, the precedence constraint line is solid. If it is set to OR, the line is dotted. This is useful if you want to be notified when any one of the tasks fails by using the logical OR constraint.
In the Evaluation Operation dropdown, you can edit how the task will be evaluated:
- Constraint: Evaluates the success, failure, or completion of the predecessor task or tasks
- Expression: Evaluates the success of a customized condition that is programmed using an expression
- Expression and Constraint: Evaluates both the expression and the constraint before moving to the next task
- Expression or Constraint: Determines whether either the expression or the constraint has been successfully met before moving to the next task
If you select Expression or one of its variants as your option, you’ll be able to type an expression in the Expression box. An expression is usually used to evaluate a variable before proceeding to the next task. For example, if you want to ensure that InputFileVariable variable is equal to the Variable2 variable, you would use the following syntax in the Expression box:
@InputFileVariable == @Variable2
You can also single-click the constraint and use the Properties window on the right to set these properties, if you prefer not to use the editor.
Now that you know more about constraints, it’s a good time to look at a scenario and how the constraints apply to your environment. The example shown in below diagram demonstrates how flexible precedence constraints can be. The first Script Task runs a script to determine whether the file exists. An expression on the precedence constraint coming out of the Script Task determines whether the answer returned from the script is set to true or false. If the file exists, then Task A is run; otherwise, Task B is run. After that, note that regardless of whether Task A or B is run, the Archive Script Task always runs because of the Logical Or constraint. Finally, if the Sequence Container succeeds, then Task C is run, and if it does not, then the Alert Script Task is run.
You may have noticed that in Figure 2-17, I added the names of the precedence constraints (success, failure, completion) to each of the arrows. This may be necessary for those who are color-blind (and is required for this tutorial because it’s printed in black and white). To do this in SSDT, select Tools ⇒ Options, and then check “Show precedence constraint labels” under the Business Intelligence Designers ⇒ Integration Services Designers tab, as shown in below diagram.
A very nice usability feature in SSIS is the capability to group tasks or transforms logically in containers. For example, if you have a group of tasks that create and purge the staging environment, you can group them together so that your package is not cluttered visually. For example, in below diagram there are two tasks to load data and send a message. To group them, select both tasks by clicking one task and holding the Ctrl key down while you select the second task. Then, right-click the tasks and select Group.
NOTE Groups and containers are not the same. Groups are a usability feature to bring together components. Containers on the other hand allow you to pass properties into them. You can read more about groups, containers, and their differences in Chapter 6.
After you have the two tasks grouped, you’ll see a box container around them. A group is not the same as a container. It only gives you a usability feature to bring tasks together. They don’t share properties like transactions. To rename the group, simply double-click the container and type the new name over the old one. You can also collapse the group so that your package isn’t cluttered. To do this, just click the arrows that are pointing downward in the group. Once collapsed, your grouping will look like below diagram. You can also ungroup the tasks by right-clicking the group and selecting Ungroup.
This same type of group can be created in the Data Flow tab to logically group sources, transformations, and destinations together. You’ll learn more about containers in Chapter 6.
Annotations are a key part of any package, and a good developer won’t fail to include them. An annotation is a comment that you place in your package to help others and yourself understand what is happening in the package. To add an annotation, right-click where you want to place the comment, select Add Annotation, and begin typing. You can resize the box if you need more room. It is a good idea to always add an annotation to your package that shows the title and version of your package. Most SSIS developers also add a version history annotation note to the package, so that they can see what has changed in the package between releases and who performed the change. You can see an example of this in below diagram.
You may have already noticed the Connection Managers tab at the bottom of your Package Designer pane. This tab contains a list of connections that both Control Flow and Data Flow Tasks can use. Whether the connection is an FTP address or a connection to an Analysis Services server, you’ll see a reference to it here. These connections can be referenced as either sources or targets in any of the operations, and they can connect to relational or Analysis Services databases, flat files, or other data sources. They’re also used in many of the SSIS tasks.
When you create a new package, no connections are defined. You can create a connection by right-clicking in the Connections area and choosing the appropriate data connection type. Once the connection is created, you can rename it to fit your naming conventions or to better describe what is contained in the connection. Nearly any task or transformation that uses data or references a file will require a Connection Manager. Below Diagram shows a few connections: two to relational databases (AdventureWorksDW and Stage), an SMTP reference, and a directory.
Notice the two Connection Managers that refer to the AdventureWorksDW database and the staging database. The one with a database icon is a local Connection Manager that can be seen only inside the current package. The Connection Manager with (project) in front is a project connection that can be seen in any package. Any local Connection Manager can be converted to a project connection by right-clicking it and selecting Convert to Project Connection. Note that we have used a naming convention so the designer can easily recognize what type of connection it is.
Variables are a powerful piece of the SSIS architecture; they enable you to dynamically control the package at runtime, much like you do in any .NET language. There are two types of variables: system and user. System variables are those built into SSIS, such as a package name or the package’s start time; user variables are created by the SSIS developer. Variables can also have varying scope, with the default scope being the entire package. They can also be set to be in the scope of a container, a task, or an event handler inside the package.
One of the optional design-time windows can display a list of variables. To access the Variables window, right-click in the design pane and select Variables, or select Variables from the SSIS menu. The Variables window (shown in below diagram) will appear in the bottom of SSDT by default. Also by default, you will see only the user variables; to see the system variables as well, click Grid Options and then select the Show System Variables icon in the top of the window. To add a new variable, click the Add Variable icon in the Variables window and type the variable name.
When you click the Add Variable icon, whatever task or container you select at the time will be the scope for the variable. Once the scope was set for a variable in the past SQL Server 2005 and 2008 releases, it could not be changed. Now a nice little feature in 2014 is you can change the variable scope at any time. Some of the additional columns are the Namespace and Raise Event on Variable Change properties. Lastly, you can select a variable and go to the Properties pane to see extended properties on the variable. We discuss these properties in more depth in Chapter 3.
You’ll find yourself regularly using system variables throughout your package for auditing or error handling. Some of the package-scoped system variables that you may find interesting for auditing purposes are listed in the following table.
Variables are discussed in greater detail in each chapter. For a full list of system variables, please refer to tutorials Online under “System Variables.”
Most of your time in SSIS is spent in the Data Flow tab. When you add a Data Flow Task to the Control Flow design surface, a subsequent Data Flow is created in the Data Flow tab. Whereas a package has a single Control Flow, it can have many Data Flows. You can expand the Data Flow by double-clicking the task or by going to the Data Flow tab and selecting the appropriate Data Flow Task from the top dropdown box (shown in below diagram). In the Data Flow, the key components are sources, destinations transformations (which appear in the SSIS Toolbox), and paths. Unlike in the Control Flow, where tasks and containers are connected by precedence constraints that define the package’s execution workflow; in the Data Flow, sources, transformations, and destinations are connected by paths that define the flow of data between these components. As you make a connection, the metadata (information about the columns and their data types) becomes available to the next component in the Data Flow path. Everything after the source is performed in memory with a very few exceptions.
When you first start defining the Data Flow, you add a source component to connect to a data source, and then a destination to go to. The transformations (also known as transforms throughout this tutorial) modify the data before it is written to the destination. As the data flows through the path from transform to transform, the data changes based on what transform you have selected. This entire process is covered in much more detail in Chapter 4.
The Parameters tab enables you to create input parameters for a package. These are different from variables in that they can easily be passed in from a DBA or a job. If you’re familiar with SQL Server 2005 or 2008 SSIS, these best resemble configuration files and are indeed replacements for configuration files and tables. You can find parameters in the list of tabs in SSIS. This tab identifies project level parameters but you can also have parameters that are specific to individual packages. Parameters can be made secure by setting the Sensitive property to True, as shown in below diagram, and they can be used to override nearly any property in SSIS. Some parameters can also be set to Required by setting the corresponding property to True, meaning the package won’t run without passing in this parameter. Using parameters will be discussed in more detail in Chapter 5.
The Event Handlers tab enables you to create workflows to handle errors, warnings, or completion in tasks, containers, or packages. For example, if you want to trap any errors and have them e-mailed to you, you could create an OnError event handler that is scoped to the entire package and configure it to send a message out to an operator.
You can configure the event handler’s scope under the Executable dropdown box. An executable can be a package, a Foreach Loop container, a For Loop container, a Sequence container, or a task. In the Event Handler box, you can specify the event you wish to monitor. The events you can select are described in the following table:
Event handlers are critically important to developing a package that is “selfhealing” — that is, it can correct its own problems. The key events to handle are OnError, OnWarning, OnPreExecute, and OnPostExecute. You will learn more about event handlers in Chapter 18.
The final tab in the SSIS Package Designer is the Package Explorer tab, shown in below diagram. This tab consolidates all the design panes into a single view, and lists all the tasks, connections, containers, event handlers, variables, and transforms in your package. You can double-click any item here to configure it easily. You can also modify the properties for the item in the Properties window on the right after selecting the item you wish to modify.
This tab is useful if you have a task that is throwing an error and you can’t find it to remove or fix it. This problem happens sometimes when you have tasks that accidentally fall behind a container or another task.
Executing a Package
When you want to execute a package, you can click the Play icon on the toolbar, or press F5, or choose Debug ⇒ Start. You can also execute packages by right-clicking the package in Solution Explorer and selecting Execute Package. This last technique may be a better habit to get into because clicking the Play button initiates a build, and if your packages are in a solution that has multiple projects, it may also deploy SSRS reports or SSAS cubes if those projects are included in the solution. This puts the design environment into execution mode, opens several new windows, enables several new menu and toolbar items, and begins to execute the package. When the package finishes running, SSDT doesn’t immediately go back to design mode but rather stays in execution mode to allow you to inspect any runtime variables or view any execution output. This also means that you can’t make some changes to the objects within the package. You may already be familiar with this concept from executing .NET projects.
To return to design mode, you must click the Stop icon on the debugging toolbar, or press Shift+F5, or choose Debug ⇒ Stop Debugging.
SSIS delineates between the SSIS developer and the administrator. SQL Server Management Studio is where administrators do most of their work — executing, securing, and updating packages. From the Management Studio interface, the administrator is not able to design packages, however. This function is reserved for SSDT only.
You can open SQL Server Management Studio under the Microsoft SQL Server program group on the Start menu. Then, in the Object Browser pane (which can be opened from the View menu if it’s closed), select Connect ⇒ Database Engine. Type your SQL Server instance name and click Connect. If you receive an error, you may want to jump ahead to Chapter 22, which explains how to correct connectivity issues. There’s also an SSIS service you can connect to, but this is for packages running in legacy mode. You’ll learn much more about this in Chapter 22 as well.