SSIS Workflow Tasks – Integration Services
Recommended by 0 users
So far, we’ve been focused on tasks that are occurring within the immediate realm of ETL processing. You’ve looked at tasks for creating control structures, preparing data, and performing RDBMS operations. This section looks at being able to control other processes and applications in the operating system. Here we sidestep a bit from typical ETL definitions into things that can be more enterprise application integration (EAI) oriented. SSIS packages can also be organized to execute other packages or to call external programs that zip up files or send e-mail alerts, and even put messages directly into application queues for processing.
Execute Package Task
The Execute Package Task enables you to build SSIS solutions called parent packages that execute other packages called child packages. You’ll find that this capability is an indispensable part of your SSIS development as your packages begin to grow. Separating packages into discrete functional workflows enables shorter development and testing cycles and facilitates best development practices. Though the Execute Package Task has been around since the legacy DTS, several improvements have simplified the task:
- The child packages can be run as either in-process or out-ofprocess executables. In the Package tab of the Execute Package Task Editor is the ExecuteOutOfProcess property; set to the default value of false, it will execute the package in its own process and memory space. A big difference in this release of the task compared to its 2008 or 2005 predecessor is that you execute packages within a project to make migrating the code from development to QA much easier.
- The task enables you to easily map parameters in the parent package to the child packages now too.
The majority of configurable properties are in the Package tab of the Execute Package Task Editor. The first option provides the location of the child package. The ReferenceType option can be either External or Project References. This means you can point to a package inside your current project or outside the project to a SQL Server or file system. The best (easiest) option is to refer to a package in a project, as this option will easily “repoint” the reference as you migrate to production. If you point to an External Reference, you’ll need to create a Connection Manager that won’t automatically repoint as you migrate your packages from development to production. The configured tab will look like Figure 3-29.
Next, go to the Parameter Bindings tab to pass parameters into the child package. First, select any parameters in the child package from its dropdown box, and then map them to a parameter or variable in the parent package. Parameters will only work here with Project Referenced packages. You can see an example of this in Figure 3-30, or download the entire example from www.wrox.com/go/prossis2014.
Execute Process Task
The Execute Process Task will execute a Windows or console application inside of the Control Flow. You’ll find great uses for this task to run command-line-based programs and utilities prior to performing other ETL tasks. The most common example would have to be unzipping packed or encrypted data files with a command-line tool.
You can store any errors resulting from the execution of the task into a variable that can be read later and logged. In addition, any output from the command file can also be written to a variable for logging purposes. Figure 3- 31 shows a sample of using the Execute Process Task to expand a compressed customers.zip file.
The Process tab in the Execute Process Task Editor contains most of the important configuration items for this task:
- RequireFullFileName property: Tells the task whether it needs the full path to execute the command. If the file is not found at the full path or in the PATH environment variables of the machine, the task will fail. Typically, a full path is used only if you want to explicitly identify the executable you want to run. However, if the file exists in the System32 directory, you wouldn’t normally have to type the full path to the file because this path is automatically known to a typical Windows system.
- Executable property: Identifies the path and filename for the executable you want to run. Be careful not to provide any parameters or optional switches in this property that would be passed to the executable. Use the Arguments property to set these types of options separately. For example, Figure 3-31 shows that the task will execute expand.exe and pass in the cabinet from which you want to extract and where you’d like it to be extracted.
- WorkingDirectory option: Contains the path from which the executable or command file will work.
- StandardInputVariable parameter: This is the variable you want to pass into the process as an argument. Use this property if you want to dynamically provide a parameter to the executable based on a variable.
- StandardOutputVariable parameter: You can also capture the result of the execution by setting the property StandardOutputVariable to a variable.
- StandardErrorVariable property: Any errors that occurred from the execution can be captured in the variable you provide in this property.
These variable values can be used to send back to a scripting component to log or can be used in a precedence constraint that checks the length of the variables to determine whether you should go to the next task. This provides the logical functionality of looping back and trying again if the result of the execution of the expand.exe program was a sharing violation or another similar error.
Other options in the Process tab include:
- FailTaskIfReturnCodeIsNotSuccessValue property: Another option for validating the task.
- SuccessValue option: The Execute Process Task will fail if the exit code passed from the program is different from the value provided in the SuccessValue option. The default value of 0 indicates that the task was successful in executing the process.
- Timeout/TerminateProcessAfterTimeOut properties: The Timeout property determines the number of seconds that must elapse before the program is considered a runaway process. A value of 0, which is the default, means the process can run for an infinite amount of time. This property is used in conjunction with the TerminateProcessAfterTimeOut property, which, if set to true, terminates the process after the timeout has been exceeded.
- WindowStyle option: This can set the executable to be run minimized, maximized, hidden, or normal. If this is set to any option other than hidden, users will be able to see any windows that potentially pop up and may interact with them during runtime. Typically, these are set to hidden once a package is fully tested.
With the Execute Process Task, you can continue to use command-line or out-of-processes executables to organize work for ETL tasks. Now it’s time to take a look at how SSIS can interact and integrate with your enterprise messaging bus.
Message Queue Task
The Message Queue Task enables you to send or receive messages from Microsoft Message Queuing (MSMQ) right out of the box. For integration with other messaging systems like IBM’s MQ Series or Tibco’s Rendezveus, you need to either code to a library within a Script Task, create a custom component, or execute T-SQL statements to a SQL Server Service Broker queue. Messaging architectures are created to ensure reliable communication between two disparate subsystems.
A message can be a string, a file, or a variable. The main benefit to using this task is the capability to make packages communicate with each other at runtime. You can use this to scale out your packages, having multiple packages executing in parallel, with each loading a subset of the data, and then checking in with the parent package after they reach certain checkpoints. You can also use this task for enterprise-level information integration to do things like deliver dashboard-level information using XML files to an enterprise bus or distribute report content files across your network. Satellite offices or any other subscriber to those services could pull content from the queue for application-level processing.
The task is straightforward. In the General tab, shown in Figure 3-32, you specify the MSMQ Connection Manager under the MSMQConnection property. Then, you specify whether you want to send or receive a message under the Message option. In this tab, you can also specify whether you want to use the legacy Windows 2000 version of MSMQ; this option is set to false by default.
The bulk of the configuration is under the Send or Receive tab (the one you see varies according to the Message option you selected in the General tab). If you’re on the Receive tab, you can configure the task to remove the message from the queue after it has been read. You can also set the timeout properties here, to control whether the task will produce an error if it experiences a timeout.
Regardless of whether you’re sending or receiving messages, you can select the type of the message under the MessageType option. You can either send or receive a string message, a variable, or a data file. Additionally, if you’re receiving a message, you can immediately store the message you receive in a package variable by setting String Message to Variable and then specifying a variable in the Variable option.
Send Mail Task
The Send Mail Task provides a configurable SSIS task for sending e-mail messages via SMTP. In legacy DTS packages, you had to send messages out through MAPI, which meant installing Outlook on the server on which the package was running. That is now no longer a requirement. Most of the configuration options are set in the Mail tab (shown in Figure 3-33) of the Send Mail Task Editor. The SmtpConnection property is where you either create a new or select an existing SMTP Connection Manager.
Most of the configuration options will depend upon your specific SMTP connection. One option of special interest is the MessageSourceType property, which specifies whether the message source will be provided from a file or a variable or be directly inputted into the MessageSource property. Typically, the best practice is to use a variable-based approach to set the MessageSource property.
WMI Data Reader Task
Windows Management Instrumentation (WMI) is one of the best-kept secrets in Windows. WMI enables you to manage Windows servers and workstations through a scripting interface similar to running a T-SQL query. The WMI Data Reader Task enables you to interface with this environment by writing WQL queries (the query language for WMI) against the server or workstation (to look at the Application event log, for example). The output of this query can be written to a file or variable for later consumption. Following are some applications for which you could use the WMI Data Reader Task:
- Read the event log looking for a given error.
- Query the list of applications that are running.
- Query to see how much RAM is available at package execution for debugging.
- Determine the amount of free space on a hard drive.
To get started, you first need to set up a WMI Connection Manager in the Connection Manager Editor. Connection requirements vary, but Figure 3-34 shows an example of a WMI connection for a typical standalone workstation.
Notice here that the Use Windows Authentication option has been set. WMI typically requires a higher level of security authorization because you are able to query OS-level data. With a WMI connection, you can configure the WMI Data Reader Task Editor using the WMI Options tab shown in Figure 3-35.
- WmiConnection/WqlQuerySourceType: First, you set the WMIConnection, and then determine whether the WMI query will be directly inputted, retrieved from a variable, or retrieved from a file, and set the WqlQuerySourceType.
- WqlQuerySource: Specifies the source for the query that you wish to run against the connection. This may be a variable name, a text filename, or a hardcoded query itself.
- OutputType: This option specifies whether you want the output of the query to retrieve just the values from the query or also the column names along with the values.
- OverwriteDestination: This option specifies whether you want the destination to be overwritten each time it is run, or whether you want it to just append to any configured destination. If you save the output to an object variable, you can use the same technique of shredding a recordset that you learned earlier in the Execute SQL Task.
WQL queries look like SQL queries, and for all practical purposes they are, with the difference that you are retrieving data sets from the operating systems. For example, the following query selects the free space, the name, and a few other metrics about the C: drive (see code file Ch03SQL.txt):
SELECT FreeSpace, DeviceId, Size, SystemName, Description FROM
WHERE DeviceID = ‘C:’
The output of this type of query would look like this in a table:
Description, Local Fixed Disk
The following example of a WQL query selects information written to the Application event log after a certain date about the SQL Server and SSIS services (code file Ch03SQL.txt):
SELECT * FROM Win32_NTLogEvent WHERE LogFile = ‘Application’ AND
(SourceName=’SQLISService’ OR SourceName=’SQLISPackage’) AND
The results would look like this:
Typically, the WMI Data Reader Task is used in SQL Server administration packages to gather operational-type data from the SQL Server environments.
However, the WMI Event Watcher Task has some interesting uses for ETL
processes that you’ll look at next.
WMI Event Watcher Task
The WMI Event Watcher Task empowers SSIS to wait for and respond to certain WMI events that occur in the operating system. The task operates in much the same way as the WMI Data Reader Task operates. The following are some of the useful things you can do with this task:
- Watch a directory for a certain file to be written.
- Wait for a given service to start.
- Wait for the memory of a server to reach a certain level before executing the rest of the package or before transferring files to the server.
- Watch for the CPU to be free.
To illustrate the last example of polling to determine when the CPU is less than 50 percent utilized, you could have the WMI Event Watcher Task look for an event with this WQL code:
SELECT * from __InstanceModificationEvent WITHIN 2 WHERE
‘Win32_Processor’ and TargetInstance.LoadPercentage < 50
The next section looks at a direct application of this WMI Event Watcher Task to give you a better idea of how to configure it and what it can do.
Polling a Directory for the Delivery of a File
One very practical use of the WMI Event Watcher for ETL processing is to provide a buffer between the time when an SSIS job starts and the time when a file is actually delivered to a folder location. If there is a window of variability in file delivery and an SSIS package starts on a onetime schedule, then it is possible to miss processing the file for the day. By using a WMI Event Watcher, you can set up your SSIS packages to poll a folder location for a set period of time until a file is detected. If you have this type of challenge, a better solution may be a ForEach Loop Container scheduled to run periodically, but you’ll learn more about that in Chapter 6. To set up a task to perform this automated action, open the WMI Options tab of the WMI Event Watcher Task Editor (see Figure 3-36). Notice that this WMI Task is completely different from the WMI Data Reader Task.
This WMI Event Watcher Task provides properties such as the AfterEvent option, which specifies whether the task should succeed, fail, or keep querying if the condition is met. You also need to provide a length of time after which the WMI Event Watcher stops watching by setting the Timeout property. The timeout value is in seconds. The default of zero (0) indicates that there is no timeout. Outside of your development activities, be very careful with leaving this setting on zero (0). The WMI Event Watcher could leave your SSIS package running indefinitely.
You can also configure what happens when a timeout occurs under the ActionAtTimeout and AfterTimeout settings. The NumberOfEvents option configures the number of events to watch for. You can use this setting to look for more than one file before you start processing.
The WqlQuerySource for the File Watcher Configuration for this WMI Task would look like this code:
SELECT * FROM __InstanceCreationEvent WITHIN 10
WHERE TargetInstance ISA “CIM_DirectoryContainsFile”
AND TargetInstance.GroupComponent =
If you run this task with no files in the C:\ProSSIS\ directory, the task will remain yellow as the watcher continuously waits for an event to be raised. If you copy a file into the directory, the task will turn green and complete successfully. This is a great addition that is less resource-intensive than the legacy DTS version of iterating in a For loop until the file is found. As you can see, there are some major improvements in the capabilities to control workflow in SSIS.