For more than 2 yrs of experience people this PERFORMANCE TUNING mandatory concept and they must have good knowledge on this.
Situations to go:
a) To create a package with optimization.
b) There is a package which is running such a long time.
In the above situation we need to identify the “bottle necks” and resolve it.
This bottle necks can be at many levels
Identifying bottle necks
By using “progress tab” information or by using “log providers” we identify the bottle necks (because they display step by step execution).
Package level tuning tips
Implement check points to have better restart ability of components in the package.
a) Disable event handlers: Event handler decrease package performance so, unnecessary event handlers should be removed or disabled.
b) Maximum concurrent executables: Increasing the one no.of executable will increase the parallelism of package and concurrently execute in less time.
c) Maximum error count: Default ‘1’ means it fails for single error in the package. If you increase the error count it doesn’t fail the package until it reaches the count.
1) Delay validation: (it’s true/ false)
True means the validations of component is delayed until the execution of other component finished.
Description: Until execute SQL task1” execution finishes. “Execute SQL task2”. Validation not started.
BLOB temp storage path: Specify this at the time of working with Binary large objects such as images, media files etc..
Default buffer max. rows and size: Increase or decrease according to the volume of data loading. i.e., for more volume increase rows and buffer size, for less volume, decrease rows and buffer size.
Engine threads: Default it takes ‘10’, if we increase more threads it runs more parallel and uses more processes to finish the data flow operations.
Run in optimized mode: If it’s true the dataflow avoids unnecessary transformations, conservations etc operations usage.
a) In case of flat file
Try to take the flat file local to the system.
Use the property “fast parse=true”. So that the columns user faster, local neutral parsing routines and avoid unnecessary conversions.
b) If the source is table or view
Create indexes on the source table so that it set gives the data faster.
Instead of taking a table, take an SQL query or SQL command as data access mode to get required columns and rows of data.
a) In case of flat file
b) Use data access mode as fast load to load the data much faster.
c) The table contains constraints, indexes, triggers, and then loading will be slow. So we need to disable or drop them, later once the loading is finished recreate or enable them.
To implement this there are many ways:
Another way to implement the above package functionality is by event handling by
On pre execute of data flow → Disable/drop functionality
On post execute of data flow → Enable /recreate
We need to identify either the transformation is synchronous or asynchronous.
Asynchronous transformation always taken time to run than synchronous.
Here the rows are processed as it is getting from the source.
Eg: derived column, data conversion…. Etc.
This transformation occupies less memory and processes
In this case until the required rows the system collects, it doesn’t perform any operation.
It requires more memory and process many rows. This transformation either blocks partially or fully. So these transformations can also be called as “partially blocking or full blocking transformations”.
Eg: sort, aggregate, merge..etc.
In this transformation, until they receive all rows they will not perform any operation.
Get Updates on Tech posts, Interview & Certification questions and training schedules