Home / Data Warehousing

Data Acquisition – ETL Process In Data Warehousing

Rating: 5.0Blog-star
Views: 7354
by Ruchitha Geebu
Last modified: February 27th 2021

Data acquisition ETL Process in Data Warehousing

Data Acquisition:

In DWH terminology, Extraction, Transformation, Loading (ETL) is called as Data Acquisition

It is a process of extracting relevant business information from multiple operational source systems, transforming the data into a homogenous format and loading into the DWH/Datamart.

Two types of ETL’s used in implementing data acquisition.

1) A) Code  Based  ETL: An ETL application can be developed using programming languages such as PL/SQL

Ex: Teradata ETL Utilities

2) SAS based & SAS access

B) GUI Based ETL: An application can be designed with a simple GUI interface, point & click techniques

Ex: Informatica, Datastage, Data junction, Abintio, Data services, Data manager, Oracle data integrator (OWB), SSIS (SQL Server Integration Service)

Data Extraction:

It is the process of reading the data from the following operational source systems (External to the organization and Internal databases)

  • Relational Systems—-Oracle, SQL Server, DB2, Sybase, Informix, Redbrick
  • File sources: XML, COBOL, Flat files, weblog
  • Legacy systems: Mainframes, VSAM
  • Other sources: MS-Access, MS-Excel, MQ Series, TIBCO
  • ERP Sources: SAP R/3, Peoplesoft, JDEdwards, BAAN, RAMCO systems
Enhance your IT skills and proficiency in Data Warehousing by taking up the  Informatica Training.

Data Transformation:

Transforming the data from one format to the client required format/ business standard format. A staging is a temporary memory or buffer where the following data transformation activities take place

  1. Data merging
  2. Data Cleansing
  3. Data scrubbing
  4. Data aggregation

1. Data Merging:

It’s a process of integrating the data from multiple data sources into a single operational data set Data Sources can be homogenous/heterogeneous

Two types of merging:

A) Horizontal Merging: Process of joining the records horizontally when the 2 sources are having different data definitions.

Ex: Source1—Emp Table

columns –Empno, Ename, Sal, Deptno

Source2—-Dept Table

columns in the Dept Table– Dept no, Dname, Loc

B) Vertical Merging: Process of joining the records vertically when the 2 sources are having similar data structures

Frequently Asked Informatica Interview Questions

2. Data cleansing:

Data cleansing is the process of ensuring that a set of data is correct and accurate. During this process, records are checked for accuracy and consistency, and they are either corrected or deleted as necessary. This can occur within a single set of records or between multiple sets of data that need to be merged or that will work together

3. Data scrubbing:R

Data scrubbing is the process of detecting and removing or correcting any information in a darticle;tabase that has some sort of error. This error can be because the data is wrong, incomplete, formatted incorrectly, or is a duplicate copy of another entry. Many data-intensive fields of business such as banking, insurance, retail, transportation, and telecommunications may use these sophisticated software applications to clean up a database’s information.

[Related Article:Trending Data Warehousing Tools]


4. Data aggregation:

Is a process of calculating the summaries from the detailed data

  • Data scrubbing is applied for each record
  • Data aggregation is applied for a group of records

Data Loading:

Dumping/inserting of the processed data into the target system (DWH)

Two types of data loads:

A) Initial/Full load: is a process of loading the data into an empty target table

B) Incremental/Delta load: is a process of loading only new records/changed records, which takes place after initial load

  • First, we load dimensions
  • Next, we load facts
Explore Informatica Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!



About Author

NameRuchitha Geebu
Author Bio

I am Ruchitha, working as a content writer for MindMajix technologies. My writings focus on the latest technical software, tutorials, and innovations. I am also into research about AI and Neuromarketing. I am a media post-graduate from BCU – Birmingham, UK. Before, my writings focused on business articles on digital marketing and social media. You can connect with me on LinkedIn.