The Pipeline Component Methods
Components are normally described as having two distinct phases: design time and runtime. The design-time phase refers to the methods and interfaces that are called when the component is being used in a development environment — in other words, the code that is being run when the component is dragged onto the SSIS design surface, and when it is being configured. The runtime functionality refers to the calls and interfaces that are used when the component is actually being executed — in other words, when the package is being run.
When you implement a component, you inherit from the base class, Microsoft.SqlServer.Dts.Pipeline.PipelineComponent, and provide your own functionality by overriding the base methods, some of which primarily design time, others runtime. If you are using native code to write SSIS components, then the divide between the runtime and the design time is clearer because the functionality is implemented on different interfaces. Commentary on the methods has been divided into these two sections, but there are some exceptions, notably the connection-related methods; a section on connection time–related methods is included later in this chapter.
NOTE In programming terms, a class can inherit functionality from another class, termed the base class. If the base class provides a method, and the inheriting class needs to change the functionality within this method, it can override the method. In effect, you replace the base method with your own. From within the overriding method, you can still access the base method, and call it explicitly if required, but any consumer of the new class will see only the overriding method.
The following methods are explicitly implemented for design time, overriding the PipelineComponent methods, although they will usually be called from within your overriding method. Not all of the methods are listed, because for some there is little more to say, and others have been grouped together according to their area of function. Refer to the SQL Server documentation for a complete list.
Some methods are described as verification methods, and these are a particularly interesting group. They provide minor functions, such as adding a column or setting a property value, and you might quite rightly assume that there is little point in ever overriding them because there isn’t much value to add to the base implementation. These verification methods have code added to verify that the operation about to take place within the base class is allowed. The following sections expand on the types of checks you can do; if you want to build a robust component, these are well worth looking into.
Another very good reason to implement these methods as described is to reduce code. These methods will be used by both a custom user interface (UI) and the built-in component editor, or Advanced Editor. If you raise an error saying that a change is not allowed, then both user interfaces can capture this and provide feedback to the user. Although a custom UI would be expected to prevent blatantly inappropriate actions, the Advanced Editor is designed to offer all functionality, so you are protecting the integrity of your component regardless of the method used.
This method is provided so you can set up your component. It is called when a component is first added to the Data Flow, and it initializes the component. It does not perform any column-level activity, because this is left to ReinitializeMetadata; when this method is invoked, there are generally no inputs or outputs to be manipulated anyway. Following are the sorts of procedures you may want to set in here:
Validate is called numerous times during the lifetime of the component, both at design time and at runtime, but the most interesting work is usually the result of a design-time call. As the name suggests, it validates that the content of the component is correct and will enable you to at least run the package. If the validation encounters a problem, then the return code used is important to determine any further actions, such as calling ReinitializeMetaData. The base class version of Validate performs its own checks in the component, and you will need to extend it further in order to cover your specific needs. Validate should not be used to change the component at all; it should only report the problems it finds.
The ReinitializeMetaData method is where all the building work for your component is done. You add new columns, remove invalid columns, and generally build up the columns. It is called when the Validate method returns VS_NEEDSNEWMETADATA. It is also your opportunity to do any component repairs that need to be done, particularly regarding invalid columns, as mentioned previously.
MapInputColumn and MapOutputColumn
These methods are used to create a relationship between an input/output column and an external metadata column. An external metadata column is an offline representation of an output or input column and can be used by downstream components to create an input. For instance, you may connect your Source Component to a database table to retrieve the list of columns. However, once you disconnect from the database and edit the package in an offline manner, it may be useful for the source to “remember” the external database columns.
This functionality enables you to validate and maintain columns even when the Data Source is not available. It is not required, but it makes the user experience better. If the component declares that it will be using External Metadata (IDTSComponentMetaData100.ValidateExternalMetadata), then the user in the advanced UI will see upstream columns on the left and the external columns on the right; if you are validating your component against an output, you will see the checked list box of columns.
Input and Output Verification Methods
There are several methods you can use to deal with inputs and outputs. The three functions you may need to perform are adding, deleting, and setting a custom property. The method names clearly indicate their function:
For most components, the inputs and outputs will have been configured during ProvideComponentProperties, so unless you expect a user to add additional inputs and outputs and fully support this, you should override these methods and fire an error to prevent this. Similarly, unless you support additions, you would also want to deny deletions by overriding the corresponding methods. Properties can be checked for validity during the Set methods as well.
Set Column Data Types
Two methods are used to set column data types: one for output columns and the other for external metadata columns. There is no input column equivalent, because the data types of input columns are determined by the upstream component.
These are verification methods that can be used to validate or prevent changes to a column. For example, in a Source Component, you would normally define the columns and their data types within ReinitializeMetaData. You could then override SetOutputColumnDataTypeProperties, and by comparing the method’s supplied data types to the existing column, you could prevent data type changes but allow length changes.
There is quite a complex relationship between all the parameters for these methods; please refer to SQL Server documentation for reference when using this method yourself.
This method enables you to update an existing version of the component with a new version in a transparent manner on the destination machine.
This method enables you to register custom events in a Pipeline Component. You can therefore have an event fire on something happening at runtime in the package. This is then eligible to be logged in the package log.
This method decides which of the new custom events are going to be registered and selectable in the package log.
In the ProvideComponentProperties method, you tell the component about any custom properties that you would like to expose to the users of the component and perhaps allow them to set. Using the SetComponentProperty verification method, you can check what the user has entered for which custom property on the component and ensure that the values are valid.
Setting Column Properties
There are three column property methods, each of which enables you to set a property for the relevant column type:
These are all verification methods and should be used accordingly. For example, if you set a column property during ReinitializeMetaData and want to prevent users from interfering with this, you could examine the property name (or index) and throw an exception if it is restricted property, in effect making it read-only.
Similarly, if several properties are used in conjunction with each other at runtime to provide direction on the operation to be performed, you could enumerate all column properties to ensure that those related properties exist and have suitable values. You could assign a default value if a value is not present or raise an exception depending on the exact situation.
For an external metadata column, which will be mapped to an input or output column, any property set directly on this external metadata column can be cascaded down onto the corresponding input or output column through this overridden function.
This method deals with the columns on inputs into the component. In a nutshell, you use it to select a column and to tell the component how you will treat each column. What you see coming into this method is the virtual input. This means that it is a representation of what is available for selection to be used by your component. These are the three possible usage types for a column:
This is another of the verification methods, and you should use it to ensure that the columns selected are valid. For example, the Reverse String sample shown later in the chapter can operate only on string columns, so you must check that the data type of the input column is DT_STR for string or DT_WSTR for Unicode strings. Similarly, the component performs an in-place change, so the usage type must be read/write. Setting it to read-only would cause problems during execution when you try to write the changed data back to the pipeline buffer. The Data Flow makes important decisions on column handling based on the read/write flag, and if the component writes to a read-only column, it will likely corrupt the data and the user will get incorrect results. Therefore, you should validate the columns as they are selected to ensure that they meet the design requirements for your component.
On Path Attachment
There are three closely related path attachment methods, called when the named events occur, and the first two, in particular, can be used to improve the user experience:
These methods handle situations in which, for instance, the inputs or outputs are all identical and interchangeable. Using the multicast as an example, you attach to the dangling output and another dangling output is automatically created. You detach, and the extra output is deleted.
Runtime, also known as execution time, is when you actually work with the data, through the pipeline buffer, with columns and rows of data. The following methods are used for preparing the component, doing the job it was designed for, and then cleaning up afterwards.
This method, which is similar to the PreExecute method described next, can be used for setting up anything in the component that you will need at runtime. The difference between them is that you do not have access to the Buffer Manager, so you cannot get your hands on the columns in either the output or the input at this stage. Otherwise, the distinction between the two is very fine, so usually, you will end up using PreExecute exclusively because you will need access to the Buffer Manager anyway.
PreExecute is called once and once only each time the component is run, and Microsoft recommends that you do as much preparation as possible for the execution of your component in this method. In this case, you’ll use it to enumerate the columns, reading off values and properties, calling methods to get more information, and generally preparing by gathering all the information you require in advance. For instance, you may want to save references to common properties, column indexes, and state information to a variable so that you access it efficiently once you start pumping rows through the component.
This is the earliest point in the component that you will access the component’s Buffer Manager, so you have the live context of columns, as opposed to the design-time representation. As mentioned, you do the column preparation for your component in this method, because it is called only once per component execution, unlike some of the other runtime methods, which are called multiple times.
The live and design-time representations of the columns may not match. An example of this is that typically data in the buffer is not in the same order as it is in the design time. This means that just because Column1 has an index of 0 in the design time, the buffer may not have the same index for Column1. To solve this mismatch there is a function named FindColumnByLineageID that will be used to locate the columns in the buffer. An example of FindColumnByLineageID can be found later in the chapter in the “Building the Source Component” section in the ParseTheFileAndAddToBuffer function.
PrimeOutput and ProcessInput
These two methods are covered together because they are so closely linked. Essentially, these two methods reflect how the data flows through components. Sometimes you use only one of them, and sometimes you use both. There are some rules you can follow.
In a Source adapter, the ProcessInput method is never called, and all of the work is done through PrimeOutput. In a Destination adapter, the reverse is true; the PrimeOutput method is never called, and the whole of the work is done through the ProcessInput method.
Things are not quite that simple with a transformation. There are two types of transformations, and the type of transformation you are writing will dictate which method, or indeed methods, your component should call. For a discussion on synchronous versus asynchronous transformations, see Chapter 4.
You can use this method to clean up anything that you started in PreExecute. However, this is not its only function. After reading the description of the Cleanup method, covered next, you may wonder what the difference is between them. The answer is, for this release, nothing. It might be easiest to think of PostExecute as the counterpart to PreExecute.
As the method name suggests, this is called as the very last thing your component will do, and it is your chance to clean up whatever resources may be left. However, it is rarely used. Like PreExecute and PostExecute, you can consider Cleanup to be the opposite of PrepareForExecute.
If you are using an error output and directing rows there in case of errors, then you should expose this method to provide more information about the error. When you direct a row to the error output, you specify an error code. This method will be called by the pipeline engine, passing in that error code, and it is expected to return a full error description string for the code specified. These two values are then included in the columns of the error output.
The following two methods are called several times throughout the life cycle of a component, both at design time and at runtime, and are used to manage connections within the component.
This method is called both at design time and when the component executes. There is no explicit result, but the connection is normally validated and then cached in a member variable within the component for later use. At this stage, a connection should be open and ready to use.
There is a single parameter used by the AcquireConnections, which is named transaction. The transaction parameter is set to the transaction that the connection being retrieved in AcquireConnections is participating in. The transaction parameter is set to null unless a transaction has been started by the SSIS execution engine. An example of where the transaction object would not be null would be to set the SSIS package setting of TransactionOption to Required.
If you have any open connections, as set in the AcquireConnections method, then this is where they should be closed and released. If the connection was cached in a member variable, use that reference to issue any appropriate Close or Dispose of methods. For some connections, such as a File Connection Manager, this may not be relevant because only a file path string was returned, but if you took this a stage further and opened a text stream or similar on the file, it should now be closed.
Here is a list of common Connection Managers and the values that are returned.
|SCCM||SQL Server DBA|
|Team Foundation Server||BizTalk Server Administrator|