Lookup Transformation Features
The Lookup Transformation allows you to populate the cache using a separate pipeline in either the same or a different package. You can use source data from any location that can be accessed by the SSIS Data Flow. This cache option makes it convenient to load a file or table into memory, and this data can be used by multiple Data Flows in multiple packages. Prior to SQL Server 2008, you needed to reload the cache every time it was used. For example, if you had two Data Flow Tasks in the same package and each required the same reference data set, each Lookup Transformation would load its own copy of the cache separately. You can persist the cache to virtual memory or to permanent file storage. This means that within the same package, multiple Lookup Transformations can share the same cache. The cache does not need to be reloaded for multiple Data Flows or if the same Data Flow is executed multiple times during a package execution, such as when the Data Flow Task is executed within a Foreach Loop Container. You can also persist the cache to a file and share it with other packages. The cache file format is optimized for speed; it can be much faster than reloading the reference data set from the original relational source.
Another enhancement in the Lookup Transformation in SQL Server 2008 SSIS is the miss-cache feature. In scenarios where the component is configured to perform the Lookups directly against the database, the misscache feature enables you to optimize performance by optionally loading into cache the rows without matching entries in the reference data set. For example, if the component receives the value 123 in the incoming pipeline, but there are no matching entries in the reference data set, the component will not try to find that value in the reference data set again. In other words, the component “remembers” which values it did not find before. You can also specify how much memory the miss-cache should use (expressed as a percentage of the total cache limit, by default 20%). This reduces a redundant and expensive trip to the database. The miss-cache feature alone can contribute to performance improvement especially when you have a very large data set.
In the 2005 version of the component, the Lookup Transformation had only two outputs — one for matched rows and another that combined nonmatches and errors. However, the latter output caused much dismay with SSIS users — it is often the case that a nonmatch is not an error and is in fact expected. In 2008 and later the component has one output for nonmatches and a separate output for true errors (such as truncations). Note that the old combined output is still available as an option for backward compatibility. This combined error and nonmatching output can be separated by placing a Conditional Split Transformation after the Lookup, but it is no longer necessary because of the separate outputs.
Note: To troubleshoot issues you may have with SSIS, you can add Data Viewers into a Data Flow on the lines connecting the components. Data Viewers give you a peek at the rows in memory. They also pause the Data Flow at the point the data reaches the viewer.