Home  >  Blog  >   Data Warehousing  > 

Data Acquisition – ETL Process In Data Warehousing

Rating: 5

Data acquisition ETL Process in Data Warehousing

Data Acquisition:

In DWH terminology, Extraction, Transformation, Loading (ETL) is called as Data Acquisition

It is a process of extracting relevant business information from multiple operational source systems, transforming the data into a homogenous format and loading into the DWH/Datamart.

Two types of ETL’s used in implementing data acquisition.

1) A) Code  Based  ETL: An ETL application can be developed using programming languages such as PL/SQL

Ex: Teradata ETL Utilities

2) SAS based & SAS access

B) GUI Based ETL: An application can be designed with a simple GUI interface, point & click techniques

Ex: Informatica, Datastage, Data junction, Abintio, Data services, Data manager, Oracle data integrator (OWB), SSIS (SQL Server Integration Service)

Data Extraction:

It is the process of reading the data from the following operational source systems (External to the organization and Internal databases)

  • Relational Systems—-Oracle, SQL Server, DB2, Sybase, Informix, Redbrick
  • File sources: XML, COBOL, Flat files, weblog
  • Legacy systems: Mainframes, VSAM
  • Other sources: MS-Access, MS-Excel, MQ Series, TIBCO
  • ERP Sources: SAP R/3, Peoplesoft, JDEdwards, BAAN, RAMCO systems
Enhance your IT skills and proficiency in Data Warehousing by taking up the  Informatica Training.

Data Transformation:

Transforming the data from one format to the client required format/ business standard format. A staging is a temporary memory or buffer where the following data transformation activities take place

  1. Data merging
  2. Data Cleansing
  3. Data scrubbing
  4. Data aggregation

1. Data Merging:

It’s a process of integrating the data from multiple data sources into a single operational data set Data Sources can be homogenous/heterogeneous

Two types of merging:

A) Horizontal Merging: Process of joining the records horizontally when the 2 sources are having different data definitions.

Ex: Source1—Emp Table

columns –Empno, Ename, Sal, Deptno

Source2—-Dept Table

columns in the Dept Table– Dept no, Dname, Loc

B) Vertical Merging: Process of joining the records vertically when the 2 sources are having similar data structures

Frequently Asked Informatica Interview Questions

2. Data cleansing:

Data cleansing is the process of ensuring that a set of data is correct and accurate. During this process, records are checked for accuracy and consistency, and they are either corrected or deleted as necessary. This can occur within a single set of records or between multiple sets of data that need to be merged or that will work together

3. Data scrubbing:R

Data scrubbing is the process of detecting and removing or correcting any information in a darticle;tabase that has some sort of error. This error can be because the data is wrong, incomplete, formatted incorrectly, or is a duplicate copy of another entry. Many data-intensive fields of business such as banking, insurance, retail, transportation, and telecommunications may use these sophisticated software applications to clean up a database’s information.

[Related Article:Trending Data Warehousing Tools]


4. Data aggregation:

Is a process of calculating the summaries from the detailed data

  • Data scrubbing is applied for each record
  • Data aggregation is applied for a group of records

Data Loading:

Dumping/inserting of the processed data into the target system (DWH)

Two types of data loads:

A) Initial/Full load: is a process of loading the data into an empty target table

B) Incremental/Delta load: is a process of loading only new records/changed records, which takes place after initial load

  • First, we load dimensions
  • Next, we load facts
Explore Informatica Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!



Join our newsletter

Stay updated with our newsletter, packed with Tutorials, Interview Questions, How-to's, Tips & Tricks, Latest Trends & Updates, and more ➤ Straight to your inbox!

Course Schedule
Snowflake TrainingJan 22 to Feb 06
Snowflake TrainingJan 24 to Feb 08
Snowflake TrainingJan 29 to Feb 13
Snowflake TrainingJan 31 to Feb 15
Last updated: 27 Feb 2021
About Author
Remy Sharp
Ruchitha Geebu

I am Ruchitha, working as a content writer for MindMajix technologies. My writings focus on the latest technical software, tutorials, and innovations. I am also into research about AI and Neuromarketing. I am a media post-graduate from BCU – Birmingham, UK. Before, my writings focused on business articles on digital marketing and social media. You can connect with me on LinkedIn.