These tasks could also be considered data preparation tasks, as they are responsible for bringing data sources into the ETL processes, but we have separated the Bulk Insert Task and the Execute SQL Task into this separate category because of the expectation that you will be working with data from relational database management systems (RDBMS) like SQL Server, Oracle, and DB2. The exception to this is the Bulk Insert Task, which is a wrapper for the SQL Server bulk-copy process.
The Bulk Insert Task enables you to insert data from a text or flat file into a SQL Server database table in the same high-octane manner as using a BULK INSERT statement or the bcp.exe command-line tool. In fact, the task is basically just a wizard to store the information needed to create and execute a bulk copying command at runtime (similar to BCP from a command line). If you aren’t familiar with using BCP, you can research the topic in detail in Books Online. The downside of the Bulk Insert Task is its strict data format, and it precludes being able to work with data in a Data Flow within one action. This can be seen as a disadvantage in that it does not allow any transformations to occur to the data in flight, but not all ETL processes are efficiently modified in the Data Flow. In high-volume extracts, you may be better served by loading the initial extract down in a staging table and extracting data in discrete chunks for processing within specific Data Flows. The Bulk Insert Task has no ability to transform data, and this trade-off in functionality gives you the fastest way to load data from a text file into a SQL Server database.
When you add a Bulk Insert Task to your Control Flow, follow these steps:
1. Open the Bulk Insert Task Editor to configure it. As in most tasks, use the General tab to name and describe the task. Make sure you name it something that describes its unit of work, like Prepare Staging. This will be helpful later when you deploy the package and troubleshoot problems. The next tab, Connection, is the most important. This tab enables you to specify the source and destination for the data.
2. Select the destination from the Connection dropdown in the Destination Connection group.
3. Specify a destination table from the next dropdown, below the destination connection.
4. While you’re specifying connections, go to the bottom to specify the source connection’s filename in the File dropdown. Both the source and the destination connections use the Connection Manager. If you haven’t already created the shared connections, you’ll be prompted to create them in either case by selecting.
NOTE Both the source and the optional format file must be relative to the destination SQL Server because the operation occurs there when a Bulk Insert Task is used. If you are using a network file location, use the UNC path (MachineNameShareNameFileName.csv) to the source or format file.
5. After you specify the connections, you need to provide file specifications for the format of the file you’re importing. If you created the file using the BCP utility, you can use the -f option to create a format file as well. The Bulk Insert Task can then use the BCP format file to determine how the file is formatted, or you can select the column and row delimiters in the Format property of the task. The two options are:
6. In the Options tab of the Bulk Insert Task Editor, you can use some lesser-known options:
A file (myzips_native.txt) in native format was created from SQL Server by using the bcp.exe program with the –n (native) switch and supplied with the download from www.wrox.com/go/prossis2014. You’ll see how to import it later in an example.
You can also use the Options tab to specify the first and last row to copy if you want only a sampling of the rows. This is commonly used to set the first row to two (2) when you want to skip a header row. The BatchSize option indicates how many records will be written to SQL Server before committing the batch. A BatchSize of 0 (the default) means that all the records will be written to SQL Server in a single batch. If you have more than 100,000 records, then you may want to adjust this setting to 50,000 or another number based on how many you want to commit at a time. The adjustment may vary based on the width of your file.
The Options dropdown contains five options that you can enable/disable:
There are a few other options you can set in the Options tab. The SortedData option specifies what column you wish to sort by while inserting the data. This option defaults to sort nothing, which equals false. If you need to set this option, type the column name that you wish to sort. The MaxErrors option specifies how many errors are acceptable before the task is stopped with an error. Each row that does not insert is considered an error; by default, if a single row has a problem, the entire task fails.
Frequently Asked SSIS Interview Questions & Answers
NOTE The Bulk Insert Task does not log error-causing rows. If you want bad records to be written to an error file or table, it’s better to use the Data Flow Task.
Using the Bulk Insert Task
Take time out briefly to exercise the Bulk Insert Task with a typical data load by following these steps:
1. Create a new package called BulkInsertTask.dtsx. If you haven’t already downloaded the code files for this chapter from www.wrox.com/go/prossis2014, do so. Then extract the file for this chapter named myzips.csv.
2. Create a table in the AdventureWorksDW database using SQL Management Studio or the tool of your choice to store postal code information (code file Ch03SQL.txt):
3. Back in your new package, drag the Bulk Insert Task onto the Control Flow design pane. Notice that the task has a red icon on it, indicating that it hasn’t been configured yet.
4. Double-click the task to open the editor. In the General tab, provide the name Load Zip Codes for the Name option and Loads zip codes from a flat file for the description.
5. Click the Connection tab. From the Connection dropdown box, select. This will open the Configure OLE DB Connection Manager dialog.
6. Now, you’re going to create a connection to the AdventureWorksDW database that can be reused throughout this chapter. Click New to add a new Connection Manager. For the Server Name option, select the server that contains your AdventureWorksDW database. For the database, select the AdventureWorksDW database.
7. Click OK to go back to the previous screen and click OK again to return to the Bulk Insert Task Editor. You’ll now see that the Connection Manager you just created has been transposed into the Connection dropdown box.
8. Now you need to define the destination. For the DestinationTable option, select the [dbo].[PROSSIS_ZIPCODE] table. For the first attempt, you’ll import a comma-delimited version of the zip codes. This simulates importing a file that would have been dumped out of another SQL Server (with the same table name) using this bcp command:
9. Leave the remaining options set to the defaults. The RowDelimiter property option will be {CR}{LF} (a carriage return) and the ColumnDelimiter property should be set to Comma {,}.
10. For the File option, again select to create a new Connection Manager. This will open the File Connection Manager Editor.
11. For the Usage Type, select Existing File. Then point to myZips.csv for the File option. Click OK to return to the editor. Your final Bulk Insert Task Editor screen should look similar to Figure 3-20.
If you open the myzips.csv file, you’ll notice there is no header row with the column names before the data. If you had a column header and needed to skip it, you would go to the Options tab and change the FirstRow option to 2. This would start the import process on the second row, instead of the first, which is the default.
12. You should be able to run the package now. When it executes, the table will be populated with all the postal codes from the import file. You can verify this by selecting all the rows from the PROSSIS_ZIPS table.
As you can see, the Bulk Insert Task is a useful tool to load staging files quickly, but you may need to further process the file. One reason is that this task provides no opportunity to divert the data into a transformation workflow to examine the quality of the data. Another reason is that you have to import character-based data to avoid raising errors during the loading process. The Bulk Insert Task handles errors in an all-or-nothing manner. If a single row fails to insert, then your task may fail (based on your setting for the maximum number of allowed errors). These problems can be easily solved by using a Data Flow Task if the data is unreliable.
SSRS | Power BI |
SSAS | SQL Server |
SCCM | SQL Server DBA |
SharePoint | BizTalk Server |
Team Foundation Server | BizTalk Server Administrator |
Name | Dates | |
---|---|---|
SSIS Training | Sep 17 to Oct 02 | View Details |
SSIS Training | Sep 21 to Oct 06 | View Details |
SSIS Training | Sep 24 to Oct 09 | View Details |
SSIS Training | Sep 28 to Oct 13 | View Details |
Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.