Merge Join Transformation in SSIS
The Merge Join Transformation in SSIS enables you to perform an inner or outer join operation in a streaming fashion within the SSIS Data Flow. The Merge Join Transformation does not preload data like the Lookup Transformation does in its cached mode. Nor does it perform per-record database queries like the Lookup Transformation does in its noncached mode. Instead, the Merge Join Transformation accepts two inputs, which must be sorted, and produces a single output, which contains the selected columns from both inputs, and uses the join criteria defined by the package developer.
The component accepts two sorted input streams and outputs a single stream that combines the chosen columns into a single structure. It is not possible to configure a separate nonmatched output like the one supported by the Lookup Transformation. For situations in which unmatched records need to be processed separately, a Conditional Split Transformation can be used to find the null values on the nonmatched rows and send them down a different path in the Data Flow.
The Merge Join Transformation differs from the Lookup Transformation in that it accepts its reference data via a Data Flow path instead of through direct configuration of the transformation properties. Both input Data Flow paths must be sorted, but the data can come from any source supported by the SSIS Data Flow as long as they are sorted. The sorting has to occur using the same set of columns in exactly the same order, which can create some overhead upstream.
The Merge Join Transformation typically uses less memory than the Lookup Transformation because it maintains only the required few rows in memory to support joining the two streams. However, it does not support short-circuit execution, in that both pipelines need to stream their entire contents before the component considers its work done. For example, if the first input has five rows, and the second input has one million rows, and it so happens that the first five rows immediately join successfully, the component will still stream the other 999,995 rows from the second input even though they cannot possibly be joined anymore.