Developing a Custom SSIS Source Component

The Components

Now you can move on to actually building the components. These components are simple and demonstrate the most commonly used methods when building your own components. They also help give you an idea of the composition of a component, the order in which things happen, and which method does what. While they will not implement all the available methods, these components have been built and can be extended, so why not download them and give them a go? If you happen to break them, simply revert back to a previous good copy. No programmer gets things right the first time, so breaking the component is part of the experience. (At least, that’s what programmers tell themselves at 2:00 a.m. when they are still trying to figure out why the thing isn’t doing what they wanted.)

The component classes are covered in the next sections. You will then be shown how to ensure that your component appears in the correct folder, what to put in the AssemblyInfo file, how it is registered in the global assembly cache (GAC), and how to sign the assembly. This is common to all three components, so it is covered as one topic.


In this section of the chapter, you’ll go through the steps that are common to all the Pipeline Components. These are the basic sets of tasks you need to do before you fly into coding.

Start by opening Visual Studio 2013, and create a new project, a Class Library project, as shown in Figure 19-1.

Now select the Add Reference option from the Project menu, and select the following assemblies, which are shown in Figure 19-2:

1. Microsoft.SqlServer.DTSPipelineWrap
2. Microsoft.SqlServer.DTSRuntimeWrap
3. Microsoft.SqlServer.ManagedDTS
4. Microsoft.SqlServer.PipelineHost

In a standard installation of SQL Server 2014 Integration Services, these reference files should be available in the directory C:Program FilesMicrosoft SQL Server120SDKAssemblies; 64-bit machines need to look in the x86 program files directory C:Program Files (x86)Program FilesMicrosoft SQL Server120SDKAssemblies.

Once you have those set up, you can start to add the Using directives. These directives tell the compiler which libraries you are going to use. Here are the directives you will need to add to the top of the class in the code editor:

The first stage in building a component is to inherit from the PipelineComponent base class and decorate the class with DtsPipelineComponent. From this point on, you are officially working on a Pipeline Component.

The DtsPipelineComponent attribute supplies design-time information about your component, and the first key property here is ComponentType. The three options — Source, Destination, or Transformation — reflect the three tabs within the SSIS Designer Toolbox. This option determines which tab, or grouping of components, your component belongs to. The DisplayName is the text that will appear in the Toolbox, and it’s the default name of the component when it’s added to the designer screen. The IconResource is the reference to the icon in your project that will be shown to the user both in the Toolbox and when the component is dropped onto the Package Designer. This part of the code will be revisited later in the chapter when the attribute for the User Interface, which you’ll be building later, is added.

Now type the following in the code window:

public override

After you press the space bar after the word “override,” you’ll see a list of all the methods on the base class. You are now free to type away to your heart’s content and develop the component.

We cover development of the components a little later in the chapter, but for now we focus on how you deploy it into the SSIS environment once it is ready. In addition to being built, the component also needs a few other things to happen to it. If you are a seasoned developer, then this section will be old hat to you, but for anyone else, it’s important to understand what needs to happen for the components to work:

1. Provide a strong name key for signing the assembly.
2. Set the build output location to the PipelineComponents folder.
3. Use a post-build event to install the assembly into the GAC.
4. Set assembly-level attributes in the AssemblyInfo.cs file.

SSIS needs the GAC because components can be executed from within the SQL Server Data Tools, SQL Server Agent, or DTExec, all which reside in different directories. Strong names are a consequence of this requirement. The PipelineComponents folder allows the designer to discover the component and put it in the Toolbox. Additional assembly-level metadata from your component is a consequence of the fact that the strong name, including version, is persisted in the package, which causes all your packages to break if you rebuild the component unless you stop incrementing the AssemblyVersion attribute of the component.

To sign the project, right-click your C# project and choose Properties from the context menu. You are not going to look at all the tabs on the left side of the screen, only those that are relevant to what you are doing here. Figure 19-3 shows the Application tab.

In this tab, the only thing you really need to do is change the assembly name to be the same as your default namespace.

In order for the SSIS designer to use a component, it must be placed in a defined folder. On a 32-bit (x86) PC, this folder is usually located here:

C:Program FilesMicrosoft SQL Server120DTSPipelineComponents

If you have a 64-bit (x64) environment, then you should explicitly choose the 64-bit location:

For your new component to work correctly, it must be placed in the global assembly cache. To copy the assemblies into the program files directory and install them with the GAC, you’re going to use a post-build event on the project. Copying the files into the preceding directories and installing them into the GAC are both required steps. You can do this manually, but it makes for faster development if you do it as part of the build process.

Click the Build Events tab. Then, in the Post-build Command Line dialog, enter the commands that will automatically do these tasks. You can also click on the “Edit Post Build . . .” button, which allows for better readability on the longer command lines. Following is an example post-build event command (see also in below screen shot). Be sure to include the double quotes in the path statements. If this exact command doesn’t work in your development environment, then do a search for gacutil.exe on your hard drive and use its path in the manner shown here.

NOTE Note that the path shown earlier is for Windows 8/2012 machines. The path on older OSs may be slightly different.

The SSIS components referenced in this project are built with .NET 4.5. This means you need to verify that you’re using the gacutil.exe file for .NET 4.5 as well. Otherwise, the gacutil steps of the post-build event will fail.

When you compile the code, Visual Studio will expand the macros shown in the preceding snippet into real paths — for instance, the first command shown will expand into the following statement, and Visual Studio will then execute it. Because you have declared the statement in the post-build event, after compiling the code the statement will run automatically and place the new library (.dll) into the GAC

The assembly is to be installed in the GAC, so you also need to sign it using a strong name key, which can be specified and created from the Signing page, shown in below screen shot.

NOTE Keep in mind that on some operating systems, such as Windows 7 or 8, you will be required to run the Wrox.Pipeline project in Visual Studio 2013 under the administrator account. That’s because the gacutil utility requires administrator privileges to copy files to the GAC. To run Visual Studio as the administrator, right-click the Visual Studio icon to bring up the context menu and select “Run as administrator.”

That’s all you need to do as far as the project’s properties are concerned, so now you can move on to handling the AssemblyInfo file. While most assembly attributes can be set through the Assembly Information dialog, available from the Application tab of Project Properties, you require some additional settings. Shown next is the AssemblyInfo.cs file for the example project, which can be found under the Properties folder within the Solution Explorer of Visual Studio:

The first section of attributes represents general information such as company name. The AssemblyCulture attribute should be left blank unless you are experienced with working with localized assemblies and understand the implications of any change.

The AssemblyVersion attribute is worth noting; because the version is fixed, it does not use the asterisk token to generate an automatically incrementing build number. The assembly version forms part of the fully qualified assembly name, which is how a package references a component under the
covers. Therefore, if you changed the version for every build, you would have to rebuild your packages for every new version of the component. In order to differentiate between versions, you should use AssemblyFileVersion, which you need to manually update.

The other attribute worth noting is CLSCompliant. Best practice dictates that the .NET classes and assemblies conform to the Command Language Specification (CLS), and compliance should be marked at the assembly level. Individual items of noncompliant code can then be decorated with the CLSCompliant attribute, marked as false. The completed samples all include this, and you can also refer to SQL Server documentation for guidance, as well as follow the simple compiler warnings that are raised when this condition is not met.
The following example shows how to deal with a noncompliant method in your component:

Building the Source Component

As mentioned earlier, the Source adapter needs to be able to retrieve information from a file and present the data to the downstream component. The file is not your standard-looking file, and while the format is strange, it’s consistent. When you design the Destination adapter, you will write the contents of an upstream component to a file in a very similar format. After you have read this chapter, you may want to take the Source adapter and alter it slightly so that it can read a file produced by the sample Destination adapter.

If you’re following along with the tutorial and are writing the code manually, right click on the Wrox.Pipeline project in Visual Studio and click “Add ⇒ New Item”. Select the “Class” template in the “Add New Item” dialog and create a new file named SimpleFileProcessorSource.cs.

The first method to look at is ProvideComponentProperties. This is called almost as soon as you drop the component onto the designer. Here is the method in full (SimpleFileProcessorSource.cs) before you look at its parts:

Now you can break down some of this code. The first thing the preceding code does is remove any runtime connections in the component, which you’ll be adding back soon:

You can also remove inputs, outputs, and custom properties. Basically your component is now a clean slate. This is not strictly required for this example; however, it’s advantageous to follow this convention because it prevents any unexpected situations that may arise in more complicated components.

The following three lines of code simply help to identify your component when you look in the property pages after adding it to the designer:

The only property here that may not be obvious is ContactInfo, which simply identifies to the user the developer of the component. If a component throws a fatal error during loading or saving — for example, areas not influenced by the user-controlled settings — then the designer will show the contact information for support purposes.

Next, your component needs a runtime connection from which you can read and get the data:

You removed any existing connections earlier in the method, so here is where you add it back. Simply give it a name and a description.

Downstream components will see the data by having it presented to them from an output in this component. In other words, the output is the vehicle that the component uses to present data from the input file to the next component downstream. Here you add a new output to the output collection and give it a name and a description:

The final part of this component is to use ExternalMetadataColumns, which enables you to view the structure of the Data Source with no connection:
output.ExternalMetadataColumnCollection.IsUsed = true;
Here, you tell the output you created earlier that it will use ExternalMetaData columns.

This method covers a lot of ground and is really quite interesting. The first thing you want to do is find out if you can get a Connection Manager from the runtime connection collection of the component. The runtime connection was defined during ProvideComponentProperties earlier. If it is null, then the user has not provided a runtime connection:

The next line of code is quite cool. It converts the native ConnectionManager object to a managed Connection Manager. You need the managed Connection Manager to determine what type it is and the properties:

Once you have the managed Connection Manager, you can start to look at some of its properties to ensure that it is what you want. All Connection Managers have a CreationName property. For this component, you want to ensure that the CreationName property is FILE, as highlighted here:

If the CreationName is not FILE, then you send an exception back to the component:

You’ve established that a connection has been specified and that it is the right type. However, the FILE Connection Manager can still have the wrong usage mode specified. To determine whether it has the right mode, you have to look at another of its properties, the FileUsageType property. This can return one of four values, defined by the DTSFileConnectionUsageType enumeration:

1. DTSFileConnectionUsageType.CreateFile: The file does not yet exist and will be created by the component. If the file does exist, then you can raise an error, although you may also accept this and overwrite the file. Use this type for components that create new files. This mode is more useful for Destination Components, not sources.
2. DTSFileConnectionUsageType.FileExists: The file exists, and you are expected to raise an error if this is not the case.
3. DTSFileConnectionUsageType.CreateFolder: The folder does not yet exist and will be created by the component. If the folder does exist, then you can decide how to handle this situation, as with CreateFile earlier. This is also more useful for destinations.
4. DTSFileConnectionUsageType.FolderExists: The folder exists, and you are expected to raise an error if this is not the case.

The type you want to check for in your component is DTSFileConnectionUsageType.FileExists and you do that like this, throwing an exception if the type is not what you want:

You’re nearly done checking your Connection Manager now. At this point, you need the filename so you can retrieve the file later when you need to read it for data. You do that like this:

That concludes the AcquireConnections method, so you can now move straight on to the Validate method (SimpleFileProcessorSource.cs):

The first thing this method does is check for an input. If it has an input, it raises an error back to the component using the FireError method and returns DTSValidationStatus.VS_ISCORRUPT. This is a Source adapter, and there is no place for an input. Because the data rows enter the component from the file, there is no need for a buffer input that would receive data from an upstream component.

if (ComponentMetaData.InputCollection.Count != 0)

Next, you check whether the user has specified a Connection Manager for your component. If not, then you return to the user a message indicating that a Connection Manager is required. You do this through the FireError method again. If no Connection Manager is specified, then you tell the component it is broken. Remember that you must perform the validation of any Connection Manager specified in AcquireConnections().

Now you need to check whether the output has any columns. For the initial drop onto the designer, the output will have no columns. If this is the case, the Validate() method will return DTSValidationStatus.VS_NEEDSNEWMETADATA, which in turn calls ReinitializeMetaData. You will see later what happens in that method.

If the output has output columns, then one of the things you want to check is whether the output columns have an ExternalMetaDataColumn associated with them. Recall that for ProvideComponentProperties, it was stated that you would use an ExternalMetadataColumnCollection. Therefore, for each output column, you need to ensure that there is an equivalent external metadata column and that the data type properties also match:

The next method is used to validate some properties of the component. This is not a method provided by Microsoft that you are overriding; rather, it is completely custom code used to help you do some common work, which is why such functions are sometimes called helper methods. This rather longnamed helper method, DoesEachOutputColumnHaveAMetaDataColumnAndDoDatatypesMatch, accepts as a parameter the ID of an output, so you pass in the output’s ID. This method has to do two things. First, it has to confirm that each output column has an ExternalMetadataColumn associated with it; second, it has to ensure that the two columns have the same column data type properties. Here is the method in full (SimpleFileProcessorSource.cs):

The first thing this method does is translate the ID passed in as a parameter to the method into an output:
IDTSOutput100 output =

Once you have that, the code loops over the output columns in that output to determine whether the ExternalMetadataColumnID associated with that output column has a value of 0 (that is, there is no value). If the code finds an instance of a value, then it sets the return value from the method to be false:

If all output columns have a nonzero ExternalMetadataColumnID, then you move on to the second test:

In this part of the method, you are checking whether all the attributes of the output column’s data type match those of the corresponding ExternalMetadataColumn. If they do not, then again you return false from the method, which causes the Validate() method to call ReinitializeMetaData. Notice that you use the ID rather than a name, as names can be changed by the end user.

ReinitializeMetaData is where a lot of the work happens in most components. In this component, it fixes up the output columns and the ExternalMetadataColumns. Here’s the method

This is a really simple way of doing things. Basically, you are removing all the ExternalMetaDataColumns and then removing the output columns. You will then add them back using the CreateOutputAndMetaDataColumns helper method.

NOTE As an exercise, you may want to see if you can determine which columns actually need fixing, instead of just dropping and recreating them all.

CreateOutputAndMetaDataColumns is a helper method that creates the output’s output columns and the ExternalMetaData columns to go with them (SimpleFileProcessorSource.cs). This implementation is very rigid, and it presumes that the file you get will be in one format only:

This code follows the same pattern for every column you want to create, so you’ll just look at one example here because the rest are variations of the same code. In CreateOutputAndMetaDataColumns, you first need to create an output column and add it to the OutputColumnCollection of the output, which is a parameter to the method. You give the column a name, a description, and a data type, along with details about the data type:

NOTE SetDataTypeProperties takes the name, the length, the precision, the scale, and the code page of the data type. A list of what is required for these fields can be found in tutorials Online.

Note that if you decided to use Unicode data, which does not require a code page, then the same call would have looked like this:

outName.SetDataTypeProperties(DataType.DT_WSTR, 50, 0, 0, 0);

You now create an ExternalMetaDataColumn for the OutputColumn, and you do that by calling the helper method called CreateExternalMetaDataColumn. This method takes as parameters the ExternalMetaDataColumnCollection of the output and the column for which you want to create an ExternalMetaDataColumn:


The first thing you do in the method is create a new ExternalMetaDataColumn in the ExternalMetaDataColumnCollection that was passed as a parameter. You then map the properties of the output column that was passed as a parameter to the new ExternalMetaDataColumn. Finally, you create the relationship between the two by assigning the ID of the ExternalMetaDataColumn to the ExternalMetadataColumnID property of the output column:

At this point, the base class will call the MapOutputColumn method. You can choose to override this method to ensure that the external columns and the output column match and that you want to allow the mapping to occur, but in this case you should leave the base class to simply carry on.

Now it’s time to look at the runtime methods. PreExecute is the usual place to start for most components, but it is done slightly differently here. Normally you would enumerate the output columns and enter them into a struct, so you could easily retrieve them later. For illustration purposes, you’re not going to do that here (but you do this in the Destination adapter, so you could port what you do there into this adapter as well). The only method you are interested in with this adapter is PrimeOutput. Here is the method in full:

On the face of it, this method looks really easy, but as you can see, all the work is being done by the helper method called ParseTheFileAndAddToBuffer. To that procedure you need to pass the filename you retrieved in AcquireConnections, and the buffer is buffers[0] because there is only one buffer and the collection is zero-based. You’ll look at the ParseTheFileAndAddToBuffer method in a moment, but the last thing you do in this method is call SetEndOfRowset on the buffer. This basically tells the downstream component that there are no more rows to be retrieved from the adapter.

Now consider the ParseTheFileAndAddToBuffer method in a bit more detail (SimpleFileProcessorSource.cs):

Because this is not a lesson in C# programming, we will simply describe the points relevant to SSIS programming in this component. You start off by getting references to the output columns collection in the component:

The IDTSOutputColumn100 object will be used when you need a reference to particular columns. At this point, the columns in the file are actually in rows, so you need to pivot them into columns. First, you read a single line from the file using this code:
string s = tr.ReadLine();
For this specific source file format, you can determine that you need to add a new row to the buffer if when reading a line of text from the file it begins with the word . You do that in the code shown here (remember that the variable s is assigned a line of text from the file):

Here, you have added a row to the buffer, but the row is empty (contains no data yet). As you read lines in the file, you test the beginning of each line. This is important because you need to know this in order to be able to grab the right column from the output columns collection and assign it the value from the text file. The first column name you test for is the “Name” column:

The first thing you do here is check what the row begins with. In the preceding example, it is “Name:”. Next, you set the IDTSColumn100 variable column to reference the Name column in the OutputColumnCollection. You need to be able to locate the column in the buffer, and to do this you need to look at the Buffer Manager. This has a method called FindColumnByLineageID that returns the integer location of the column. You need this to assign a value to the column.

To this method, you pass the output’s buffer and the column’s LineageID. Once you have that, you can use the SetString method on the buffer object to assign a value to the column by passing in the buffer column index and the value to which you want to set the column. Now you no longer have an empty row; it has one column populated with a real data value.

You pretty much do the same with all the columns for which you want to set values. The only variation is the method you call on the buffer object. The buffer object has a method for each of the possible data types. In this component, you need a SetInt32, a SetBoolean, and a SetDecimal method. They do not differ in structure from the SetString method at all. You can also set the value in a non-type-safe manner by using buffer[i] := value, though as a best practice this is not advised.

You can now compile the project. Assuming there are no syntax errors, the project should output the .dll in the specified folder and register it in the GAC. When SQL Server Data Tools is opened, the component should show up in the SSIS Toolbox toolbar as well.


Get Updates on Tech posts, Interview & Certification questions and training schedules