Home  >  Blog  >   SSIS

Advanced Data Cleansing in SSIS

One of the primary functions of SSIS is data cleansing, which involves transforming data from a source to a destination and conducting operations on it along the way. This post will teach all you need to know about SSIS, from the basics to advanced data cleansing.

Rating: 4
  1. Share:
SSIS Articles

Today, we are going to learn about Advanced Data Cleansing in SSIS

  1. Using the Derived Column Transformation for advanced data cleansing
  2. Applying the Fuzzy Lookup and Fuzzy Grouping transformations and understanding how they work
  3. Introducing Data Quality Services
  4. Introducing Master Data Services
Learn how to use SSIS, from beginner basics to advanced techniques, with online video tutorials taught by industry experts. Enroll for Free SSIS Training Demo!

In this Topic, you will learn the ins and outs of data cleansing in SSIS, from the basics to the advanced. In a broad sense, one of SSIS’s main purposes is to cleanse data — that is, transform data from a source to a destination and perform operations on it along the way. In that sense, someone could correctly say that every transformation in SSIS is about data cleansing. For example, consider the following transformations:

  1. The Data Conversion adjusts data types.
  2. The Sort removes duplicate data.
  3. The Merge Join correlates data from two sources.
  4. The Derived Column applies expression logic to data.
  5. The Data Mining predicts values and exceptions.
  6. The Script applies .NET logic to data.
  7. The Term Extraction and Term Lookup perform text mining.

In a stricter sense, data cleansing is about identifying incomplete, incorrect, or irrelevant data and then updating, modifying, or removing the “dirty” data. From this perspective, SSIS has four primary data cleansing transformations, which are reviewed in this Topic:

Derived Column Transformation

This transformation can perform advanced expression-based data cleansing. If you have just basic data cleansing needs, like blanks or nulls or simple text parsing, this is the right place to start. The next section will walk through some examples.

MindMajix Youtube Channel

Fuzzy Lookup Transformation

Capable of joining to external data based on data similarity, the Fuzzy Lookup Transformation is a core data cleansing tool in SSIS. This transformation is perfect if you have dirty data input that you want to associate to data in a table in your database based on similar values. Later in the Topic, you’ll take a look at the details of the Fuzzy Lookup Transformation and what happens behind the scenes.

Fuzzy Grouping Transformation

The main purpose is the de-duplication of similar data. The Fuzzy Grouping Transformation is ideal if you have data from a single source and you know you have duplicates that you need to find.

DQS Cleansing:

The Data Quality Services Cleansing Transformation leverages the DQS engine to perform predefined data quality rules and mapping. If you have any advanced cleansing where you would like to apply rules and manage cleansing logic, the DQS Transformation using the DQS engine is the right choice for you.

In addition to these data cleansing transformations, SSIS also has a Data Profiling Task that can help you identify any issues within your dirty data as you plan its necessary data cleansing. See SSIS Task Topic for an overview of the Data Profiling Task and the topic of Loading the Data Warehouse for a more detailed review of its functionality.

Frequently Asked SSIS Interview Questions & Answers

This Topic will also explore Master Data Services as a way of standardizing reference data. MDS give users the familiar interface of Excel to manage and correct data to truly have one version of the truth.

List of Related Microsoft Certification Courses:

 SSRS Power BI
 SSAS SQL Server
 SharePoint BizTalk Server
 Team Foundation Server BizTalk Server Administrator


Join our newsletter

Stay updated with our newsletter, packed with Tutorials, Interview Questions, How-to's, Tips & Tricks, Latest Trends & Updates, and more ➤ Straight to your inbox!

Course Schedule
SSIS TrainingMay 21 to Jun 05View Details
SSIS TrainingMay 25 to Jun 09View Details
SSIS TrainingMay 28 to Jun 12View Details
SSIS TrainingJun 01 to Jun 16View Details
Last updated: 03 Apr 2023
About Author

I am Ruchitha, working as a content writer for MindMajix technologies. My writings focus on the latest technical software, tutorials, and innovations. I am also into research about AI and Neuromarketing. I am a media post-graduate from BCU – Birmingham, UK. Before, my writings focused on business articles on digital marketing and social media. You can connect with me on LinkedIn.

read more
Recommended Courses

1 / 15