Mindmajix

Bulk Insert Task in SSIS Package

RDBMS SERVER TASKS

These tasks could also be considered data preparation tasks, as they are responsible for bringing data sources into the ETL processes, but we have separated the Bulk Insert Task and the Execute SQL Task into this separate category because of the expectation that you will be working with data from relational database management systems (RDBMS) like SQL Server, Oracle, and DB2. The exception to this is the Bulk Insert Task, which is a wrapper for the SQL Server bulk-copy process.

Bulk Insert Task

The Bulk Insert Task enables you to insert data from a text or flat file into a SQL Server database table in the same high-octane manner as using a BULK INSERT statement or the bcp.exe command-line tool. In fact, the task is basically just a wizard to store the information needed to create and execute a bulk copying command at runtime (similar to BCP from a command line). If you aren’t familiar with using BCP, you can research the topic in detail in Books Online. The downside of the Bulk Insert Task is its strict data format, and it precludes being able to work with data in a Data Flow within one action. This can be seen as a disadvantage in that it does not allow any transformations to occur to the data in flight, but not all ETL processes are efficiently modified in the Data Flow. In high-volume extracts, you may be better served by loading the initial extract down in a staging table and extracting data in discrete chunks for processing within specific Data Flows. The Bulk Insert Task has no ability to transform data, and this trade-off in functionality gives you the fastest way to load data from a text file into a SQL Server database.

When you add a Bulk Insert Task to your Control Flow, follow these steps:

1. Open the Bulk Insert Task Editor to configure it. As in most tasks, use the General tab to name and describe the task. Make sure you name it something that describes its unit of work, like Prepare Staging. This will be helpful later when you deploy the package and troubleshoot problems. The next tab, Connection, is the most important. This tab enables you to specify the source and destination for the data.

2. Select the destination from the Connection dropdown in the Destination Connection group.

3. Specify a destination table from the next dropdown, below the destination connection.

4. While you’re specifying connections, go to the bottom to specify the source connection’s filename in the File dropdown. Both the source and the destination connections use the Connection Manager. If you haven’t already created the shared connections, you’ll be prompted to create them in either case by selecting <New Connection …>.

NOTE Both the source and the optional format file must be relative to the destination SQL Server, because the operation occurs there when a Bulk Insert Task is used. If you are using a network file location, use the UNC path (\MachineName\ShareName\FileName.csv) to the source or format file.

5. After you specify the connections, you need to provide file specifications for the format of file you’re importing. If you created the file using the BCP utility, you can use the -f option to create a format file as well. The Bulk Insert Task can then use the BCP format file to determine how the file is formatted, or you can select the column and row delimiters in the Format property of the task. The two options are:

  • Use File: This uses the BCP format (.fmt) file.
  • Specify: This enables you to select the file delimiters. The available delimiters are New Line ({CR}{LF}), Carriage Return ({CR}), Line Feed ({LF}), Semicolon (;), Comma (,), Tab, or Vertical Bar (|). Note that the defaults are for the row to be {CR}{LF} delimited, and the column tab-delimited.

6. In the Options tab of the Bulk Insert Task Editor, you can use some lesserknown options:

  • Code page: You can specify the code page for the source file. You will rarely want to change the code page from RAW, which is the default. Using RAW is the fastest data-loading option because no code page conversion takes place.
  • OEM: You should use this when copying from one SQL Server to another.
  • ACP: This converts non-Unicode data to the ANSI code page of the SQL Server you are loading the data into, or you can specify a specific code page mapping.
  • DataFileType: Specifies the type of the source file. Options here include char, native, widechar, and widenative. Generally, files you receive will be the default option, char, but in some cases, you may see a file with a native format.

A file (myzips_native.txt) in native format was created from SQL Server by using the bcp.exe program with the –n (native) switch and supplied with the download from www.wrox.com/go/prossis2014. You’ll see how to import it later in an example.

You can also use the Options tab to specify the first and last row to copy if you want only a sampling of the rows. This is commonly used to set the first row to two (2) when you want to skip a header row. The BatchSize option indicates how many records will be written to SQL Server before committing the batch. A BatchSize of 0 (the default) means that all the records will be written to SQL Server in a single batch. If you have more than 100,000 records, then you may want to adjust this setting to 50,000 or another number based on how many you want to commit at a time. The adjustment may vary based on the width of your file.

The Options dropdown contains five options that you can enable/disable:

  • Check Constraints: This option checks table and column constraints before committing the record. It is the only option enabled by default.
  • Keep Nulls: By selecting this option, the Bulk Insert Task will replace any empty columns in the source file with NULLs in SQL Server.
  • Enable Identity Insert: Enable this option if your destination table has an identity column into which you’re inserting. Otherwise, you will receive an error.
  • Table Lock: This option creates a SQL Server lock on the target table, preventing inserts and updates other than the records you are inserting. This option speeds up your process but may cause a production outage, as others are blocked from modifying the table. If you check this option, SSIS will not have to compete for locks to insert massive amounts of data into the target table. Set this option only if you’re certain that no other process will be competing with your task for table access.
  • Fire Triggers: By default, the Bulk Insert Task ignores triggers for maximum speed. When you check this option, the task will no longer ignore triggers and will instead fire the insert triggers for the table into which you’re inserting.

There are a few other options you can set in the Options tab. The SortedData option specifies what column you wish to sort by while inserting the data. This option defaults to sort nothing, which equals false. If you need to set this option, type the column name that you wish to sort. The MaxErrors option specifies how many errors are acceptable before the task is stopped with an error. Each row that does not insert is considered an error; by default, if a single row has a problem, the entire task fails.

NOTE The Bulk Insert Task does not log error-causing rows. If you want bad records to be written to an error file or table, it’s better to use the Data Flow Task.

Using the Bulk Insert Task

Take time out briefly to exercise the Bulk Insert Task with a typical data load by following these steps:

1. Create a new package called BulkInsertTask.dtsx. If you haven’t already downloaded the code files for this chapter from www.wrox.com/go/prossis2014, do so. Then extract the file for this chapter named myzips.csv.

2. Create a table in the AdventureWorksDW database using SQL Management Studio or the tool of your choice to store postal code information (code file Ch03SQL.txt):

CREATE TABLE PROSSIS_ZIPCODE (
ZipCode CHAR(5),
State CHAR(2),
ZipName VARCHAR(16)
)

3. Back in your new package, drag the Bulk Insert Task onto the Control Flow design pane. Notice that the task has a red icon on it, indicating that it hasn’t been configured yet.

4. Double-click the task to open the editor. In the General tab, provide the name Load Zip Codes for the Name option and Loads zip codes from a flat file for the description.

5. Click the Connection tab. From the Connection dropdown box, select<New connection …>. This will open the Configure OLE DB Connection Manager dialog.

6. Now, you’re going to create a connection to the AdventureWorksDW database that can be reused throughout this chapter. Click New to add a new Connection Manager. For the Server Name option, select the server that contains your AdventureWorksDW database. For the database, select the AdventureWorksDW database.

7. Click OK to go back to the previous screen, and click OK again to return to the Bulk Insert Task Editor. You’ll now see that the Connection Manager you just created has been transposed into the Connection dropdown box.

8. Now you need to define the destination. For the DestinationTable option, select the [dbo].[PROSSIS_ZIPCODE] table. For the first attempt, you’ll import a comma-delimited version of the zip codes. This simulates importing a file that would have been dumped out of another SQL Server (with the same table name) using this bcp command:

bcp AdventureWorksDW.dbo.prossis_zipcode out
c:\ProSSIS\tasks\bulkInsertTask\myzips.csv
-c -t, -T

9. Leave the remaining options set to the defaults. The RowDelimiter property option will be {CR}{LF} (a carriage return) and the ColumnDelimiter property should be set to Comma {,}.

9. Leave the remaining options set to the defaults. The RowDelimiter property option will be {CR}{LF} (a carriage return) and the ColumnDelimiter property should be set to Comma {,}.

10. For the File option, again select <New connection …> to create a new Connection Manager. This will open the File Connection Manager Editor.

11. For the Usage Type, select Existing File. Then point to myZips.csv for the File option. Click OK to return to the editor. Your final Bulk Insert Task Editor screen should look similar to Figure 3-20.

bulk insert task editor

If you open the myzips.csv file, you’ll notice there is no header row with the column names before the data. If you had a column header and needed to skip it, you would go to the Options tab and change the FirstRow option to 2. This would start the import process on the second row, instead of the first, which is the default.

12. You should be able to run the package now. When it executes, the table will be populated with all the postal codes from the import file. You can verify this by selecting all the rows from the PROSSIS_ZIPS table.

As you can see, the Bulk Insert Task is a useful tool to load staging files quickly, but you may need to further process the file. One reason is because this task provides no opportunity to divert the data into a transformation workflow to examine the quality of the data. Another reason is that you have to import character-based data to avoid raising errors during the loading process. The Bulk Insert Task handles errors in an all-or-nothing manner. If a single row fails to insert, then your task may fail (based on your setting for the maximum number of allowed errors). These problems can be easily solved by using a Data Flow Task if the data is unreliable.


0 Responses on Bulk Insert Task in SSIS Package"

Leave a Message

Your email address will not be published. Required fields are marked *

Copy Rights Reserved © Mindmajix.com All rights reserved. Disclaimer.
Course Adviser

Fill your details, course adviser will reach you.