Flat files are one of the more common sources to work with because data in the flat files is easy to read and create by most RDBMS systems and ETL tools. The challenges in working with flat files deal with handling data in a format in which data types are not enforced, and data that is structured in challenging ways. You may also run into files that are encoded into a different code page than ASCII, such as a UNIX encoding.
SSIS can handle various formats of flat files with varying code pages. The only challenging data is unstructured data, but this can also be handled in SSIS — not with the Flat File Source adapter but rather through a Script Component that is acting as a source. Refer to Scripting in SSIS Topic for a primer on using the Script Component.
Loading Flat Files
Loading flat files from SSIS is a lot more straightforward than extracting data from a flat file. That’s because when you are loading data into a flat file from an SSIS Data Flow, SSIS already knows the specific data types and column lengths. Extracting data is harder because flat files do not contain information about the data types of the column or the structure of the file. This first example demonstrates how to use SSIS to create and load a flat file:
- Create a new SSIS package with a Data Flow Task.
- From the Toolbox, drag an OLE DB Source adapter onto the Data Flow and configure it to connect to the AdventureWorks database.
- Change the data access mode to SQL Command and type the following SQL statement (Flat_File_Select.txt) in the text window:SELECT Name, ProductNumber, ListPrice
- Switch to the Columns property page of the OLE DB Source Editor and change the column selection to include only ProductID, Name, ProductNumber, ListPrice, and Size (these should be the only columns that are checked).
- Select OK to save your changes to the OLE DB Source adapter.
- Add a Flat File Destination adapter to the Data Flow (be sure to use the Flat File Destination and not the Source!) and connect the blue data path from the OLE DB Source to the Flat File Destination.
- Double-click the Flat File Destination to open the editor.
- Select New next to the Flat File Connection Manager dropdown, which opens a new window named Flat File Format (see below screen shot). Choose Fixed Width and select OK, which opens the Flat File Connection Manager Editor.
Creating and configuring a Flat File Connection Manager is easier to create from within a Destination adapter that already understands the data than by adding a Flat File Connection Manager directly in the Connection Manager window.As below screen shot shows, there are several options for the format of the flat file. The options for the flat file are described right in the selection window.
- At this point, in the Flat File Connection Manager Editor, name your connection Products Flat File Destination, and pick a location and name for your file and enter it in the Filename window, such as c:\ProSSIS\data\products.txt.
- Open the Code Page dropdown list and observe the dozens of supported code pages — from ANSI 1252 to IBM EBCDIC to UTF to MAC. Any of these can be selected if you intend to send the file to another machine that will consume the data in a different format. Change the Code page to 65001 (UTF-8), which should be the last one on the list.
- Switch to the Advanced property page on the left, which displays a list of the columns that the Flat File Destination adapter received from the upstream transformation (in this case the Source adapter). Select OK to save the Flat File Connection Manager properties.
- Finally, in the Flat File Destination Editor, click the Mappings property page on the left, which by default maps the input columns to the columns created in the destination file.Note that in the Flat File Destination Editor, on the Connection Manager tab, there is a checkbox called “Overwrite data in the file” as Figure 14-32 shows. When this is checked, the file will be cleared before data is loaded in the Data Flow. If this is unchecked, then data will be appended to the file.
- Leave “Overwrite data in the file” checked and select OK to save the Flat File Destination properties.
- Run this simple package, which will create the flat file and overwrite any data that previously existed.
Extracting Data from Flat Files
Now that you have created and loaded a flat file, the next task is to understand how to extract data from a flat file. Of course, when you are working in your work environment, the first step in extracting data from a flat file is to make sure you have access to the file and you somewhat understand how the data is structured.
In this example, you will be working with a fixed-width file created in the prior example, encoded in UTF-8 code page format. The file contains a list of AdventureWorks products.
- Create a new SSIS package and a new Data Flow Task within the package.
- Drag the Flat File Source adapter from the Toolbox onto the Data Flow workspace and then double-click the Flat File Source to open the Flat File Source Editor.
- Connecting to a flat file requires using a package connection. Therefore, in the Flat File Source Editor, click the New button next to the Flat File Connection Manager dropdown, which opens the Flat File Connection Manager Editor.
- Name the connection Products File Source in the Connection manager name text box.
- Click the Browse button and find the products.txt file that you created in the last exercise (such as c:\ProSSIS\data\products.txt).
- Change the Code page to 65001 (UTF-8).
- Change the Format dropdown to the Fixed-width selection option, and be sure to uncheck the option Column names in the first data row.
- Switch to the Columns property page and note that because this file is a fixed-width file, you need to set the column widths. Click on the red vertical line and drag it to the right until the fields line up based on rows and columns, as below figure shows (it is easier if your window is maximized, and alternately, you can just change the Row width property to 97).
- Now you need to identify the fixed-width columns by clicking in the text space right before each column starts. You need to do this for every column.
- Click the Advanced property page tab and then click the Suggest Types button, which opens the Suggest Column Types window. Click OK to have SSIS scan the file and then suggest data types for the file.
- While you are still in the Advanced Editor, you should see Column 0 through Column 2. Click Column 0, and in the properties in the right window, change the Name property to ProductName.
- Click Column 2 and change its Name to ProductNumber.
- For Column 2, change the Name property to ListPrice and change the DataType dropdown to [DT_CY], as below screen shot shows.
- Select OK to save the Flat File Connection Manager properties.
- While still in the Flat File Source Editor, select the Columns tab, and verify that all the columns are checked in the Available External Columns list.
- Select OK to save the properties.
- From the Data Flow Toolbox, drag a Multicast Transformation to the Data Flow workspace and connect the blue data path from the Flat File Source adapter to the Multicast.
At this point, you would usually create downstream transformations or a destination. For the purpose of this example, run the package and observe how the flat file data is extracted into the Data Flow. Nothing is done with it, but it demonstrates how to extract data from a flat file.