Cache Connection Manager and Cache Transform in SSIS
The Cache Connection Manager (CCM) and Cache Transform enable you to load the Lookup cache from any source. The Cache Connection Manager is the more critical of the two components — it holds a reference to the internal memory cache and can both read and write the cache to a disk-based file. In fact, the Lookup Transformation internally uses the CCM as its caching mechanism.
Like other Connection Managers in SSIS, the CCM is instantiated in the Connection Managers pane of the package design surface. You can also create new CCMs from the Cache Transformation Editor and Lookup Transformation Editor. At design time, the CCM contains no data, so at runtime you need to populate it. You can do this in one of two ways:
- You can create a separate Data Flow Task within the same package to extract data from any source and load the data into a Cache Transformation, as shown in below the screen shot. You then configure the Cache Transformation to write the data to the CCM. Optionally, you can configure the same CCM to write the data to a cache file (usually with the extension .caw) on disk. When you execute the package, the Source Component will send the rows down the pipeline into the input of the Cache Transformation. The Cache Transformation will call the CCM, which loads the data into a local memory cache. If configured, the CCM will also save the cache to disk so you can use it again later. This method enables you to create persisted caches that you can share with other users, solutions, and packages.
- Alternatively, you can open up the CCM editor and directly specify the filename of an existing cache file (.caw file). This option requires that a cache file has previously been created for you to reuse. At execution time, the CCM loads the cache directly from disk and populates its internal memory structures.
When you configure a CCM, you can specify which columns of the input data set should be used as index fields and which columns should be used as reference fields (see below screen shot). This is a necessary step — the CCM needs to know up front which columns you will be joining on, so that it can create internal index structures to optimize the process.
Whichever way you created the CCM, when you execute the package, the CCM will contain an in-memory representation of the data you specified. That means that the cache is now immediately available for use by the Lookup Transformation. Note that the Lookup Transformation is the only component that uses the caching aspects of the CCM; however, the Raw File Source can also read .caw files, which can be useful for debugging.
If you are using the Lookup Transformation in full-cache mode, you can load the cache using the CCM (instead of specifying a SQL query as described earlier in this Post). To use the CCM option, open the Lookup Transformation and select Full Cache and Cache Connection Manager in the general pane of the editor, as shown in below screen shot. Then you can either select an existing CCM or create a new one. You can now continue configuring the Lookup Transformation in the same way you would if you had used a SQL query. The only difference is that in the Columns tab, you can only join on columns that you earlier specified as index columns in the CCM editor.
The CCM gives you several benefits. First of all, you can reuse caches that you previously saved to file (in the same or a different package). For instance, you can load a CCM using the Customer table and then save the cache to a .caw file on disk. Every other package that needs to do a Lookup against customers can then use a Lookup Transformation configured in full-cache/CCM mode, with the CCM pointing at the .caw file you created.
Second, reading data from a .caw file is generally faster than reading from OLE DB, so your packages should run faster. Of course, because the .caw file is an offline copy of your source data, it can become stale; therefore, it should be reloaded every so often. Note that you can use an expression for the CCM filename, which means that you can dynamically load specific files at runtime.
Third, the CCM enables you to reuse caches across loop iterations. If you use a Lookup Transformation in full-cache/OLE DB mode within an SSIS For Loop Container or Foreach Loop Container, the cache will be reloaded on every iteration of the loop. This may be your intended design, but if not, then it is difficult to mitigate the performance overhead. However, if you used a Lookup configured in full-cache/CCM mode, the CCM would be persistent across loop iterations, improving your overall package performance.