Programming and Extending SSIS
Once you start implementing a real-world integration solution, you may have requirements that the built-in functionality in SSIS does not meet. For instance, you may have a legacy system that has a proprietary export file format, and you need to import that data into your warehouse. You have a robust SSIS infrastructure that you have put in place that enables you to efficiently develop and manage complex ETL solutions, but how do you meld that base infrastructure with the need for customization? That’s where custom component development comes into play. Out of the box, Microsoft provides a huge list of components for you in SSIS; however, you can augment those base components with your own more specialized tasks.
The benefit here is not only to businesses but also to software vendors. You may decide to build components and sell them on the Web, or maybe start a community-driven effort on a site such as codeplex.com. Either way, the benefit you get is that your components will be built in exactly the same way that the ones that ship with SSIS are built; there is no secret sauce (besides expertise) that Microsoft adds to its components to make them behave any differently from your own. This gives you the opportunity to truly “build a better mousetrap” — if you don’t like the way that one of the built-in components behaves, you can simply build your own instead.
Building your first component may be a little challenging, but with the help of this chapter it is hoped that you will be able to overcome this. This chapter focuses on the pipeline — not because it is better than any other area of programmability within SSIS but because it will probably be the area in which you have the most benefit to gain, and it does require a slightly greater level of understanding of basic programming concepts than most SSIS developers need to create script tasks. The pipeline also gives you a glimpse of the really interesting things that Microsoft has done in SSIS. All forms of extensibility are well covered in the SQL Server documentation and samples, so don’t forget to leverage those resources as well.
The Sample Components
Three sample components are defined in this section to demonstrate the main component types. The Transform Component is expanded in Chapter 20 to include a user interface. All code samples are available on the website for this tutorial, which you can find at WWW.WROX.COM/GO/PROSSIS2014.
The pipeline, for all intents and purposes, is the way your data moves from A to B and how it is manipulated, if at all. You can find it on the Data Flow tab of your packages after you have dropped a Data Flow Task into the Control Flow. There’s no need to go into any more detail about where to find the pipeline in your package, because this has been covered elsewhere in this tutorial.
As discussed in other chapters, Integration Services enables you to use three basic component types in the pipeline. The first component type is a Source, which retrieves data from an external location (for instance, a SQL Server query, a text file, or a Web service) and transforms the input into the internal buffer format that the pipeline expects.
The Transformation-Type Component accepts buffers of data from the pipeline on one side, does something useful with the data (for instance, sorting it, calculating totals, or multicasting the rows), and then pushes the rows downstream for the next component to consume.
The Destination-Type Component also accepts buffers of data from the pipeline on its input, but rather than write the output rows to the pipeline again, it writes them to a specific external source, such as a text file or SQL Server table.
This chapter walks you through the process of building three components — one example for each of the component types just mentioned. Note that there are further classifications of components, such as synchronous and asynchronous components, but this chapter will help you get the basics right. Following is a high-level description of what each sample component will do.
Component 1: Source Adapter
The Source adapter needs to be able to do quite a few things in order to present the data to the downstream components in the pipeline in a format that the next component understands and expects. Here is a list of what the component needs to do:
As previously indicated, this component needs to do some work in order to present its data to the outside world. However, stick with it and you’ll see how easy this can be. Your aim in the Source adapter is to be able to take a file with a custom format, read it, and present its data to the downstream components.
The real-world sample scenario that we will cover is that there are many systems that export data in a proprietary format, which is hard to then import into another system. Assume that the legacy system exports customer data in the following format:
As you can see, this is a nonstandard format that none of the out-of-the-box Source adapters could deal with adequately. Of course, you could use a Script Component to read and parse the file using VB or C#, but then you would need to duplicate the code in every package that needed to read this type of file. Writing a custom Source Component means that you can reuse the component in many different packages, which saves you time and simplifies maintenance compared to the scripting route.
Component 2: Transform
A transform component enables you to take data from a source, manipulate it, and then present the newly arranged data to the downstream components. This component performs the following tasks:
In the scenario we will use for this component, we want to create a simple data obfuscation device that will take data from the source and reverse the contents. The catch, though, is that the column properties must be set correctly, and you can perform this operation only on certain data types.
Component 3: Destination Adapter
The Destination adapter will take the data received from the upstream component and write it to the destination. This component needs to do the following:
For this component, we will use the opposite scenario to the one used earlier in the chapter for the Source adapter. Instead of retrieving data from source to be used in the SSIS package, in this case, we will imagine that the pipeline retrieved data from some standard source (such as SQL Server) but we now want to write the data out to a custom flat file format, perhaps as the input file for a legacy system.
The Destination adapter will basically be a reverse of the Source adapter. When it receives the input rows, it needs to create a new file with a data layout resembling that of the source file.
The components you’ll build are actually quite simple, but the point is not complexity but how you use the methods in Microsoft’s SSIS object model. You can use the methods presented for tackling these tasks as the basis for more complex operations.
Get Updates on Tech posts, Interview & Certification questions and training schedules