Data Acquisition – ETL Process In Data Warehousing

  • (5.0)
  • | 3527 Ratings

Data acquisition ETL Process in Data Warehousing

Data Acquisition:

In DWH terminology, Extraction, Transformation, Loading (ETL) is called as Data Acquisition

It is a process of extracting relevant business information from multiple operational source systems, transforming the data into a homogenous format and loading into the DWH/Datamart.

Two types of ETL’s used in implementing data acquisition.

1) A) Code  Based  ETL: An ETL application can be developed using programming languages such as PL/SQL

Ex: Teradata ETL Utilities

2) SAS based & SAS access

B) GUI Based ETL: An application can be designed with a simple GUI interface, point & click techniques

Ex: Informatica, Datastage, Data junction, Abintio, Data services, Data manager, Oracle data integrator (OWB), SSIS (SQL Server Integration Service)

Data Extraction:

It is the process of reading the data from the following operational source systems (External to the organization and Internal databases)

  • Relational Systems—-Oracle, SQL Server, DB2, Sybase, Informix, Redbrick
  • File sources: XML, COBOL, Flat files, weblog
  • Legacy systems: Mainframes, VSAM
  • Other sources: MS-Access, MS-Excel, MQ Series, TIBCO
  • ERP Sources: SAP R/3, Peoplesoft, JDEdwards, BAAN, RAMCO systems
Enhance your IT skills and proficiency in Data Warehousing by taking up the  Informatica Training.

Data Transformation:

Transforming the data from one format to the client required format/ business standard format. A staging is a temporary memory or buffer where the following data transformation activities take place

  1. Data merging
  2. Data Cleansing
  3. Data scrubbing
  4. Data aggregation

1. Data Merging:

It’s a process of integrating the data from multiple data sources into a single operational data set Data Sources can be homogenous/heterogeneous

Two types of merging:

A) Horizontal Merging: Process of joining the records horizontally when the 2 sources are having different data definitions.

Ex: Source1—Emp Table

columns –Empno, Ename, Sal, Deptno

Source2—-Dept Table

columns in the Dept Table– Dept no, Dname, Loc

B) Vertical Merging: Process of joining the records vertically when the 2 sources are having similar data structures

Frequently Asked Informatica Interview Questions

2. Data cleansing:

Data cleansing is the process of ensuring that a set of data is correct and accurate. During this process, records are checked for accuracy and consistency, and they are either corrected or deleted as necessary. This can occur within a single set of records or between multiple sets of data that need to be merged or that will work together

3. Data scrubbing:

Data scrubbing is the process of detecting and removing or correcting any information in a database that has some sort of error. This error can be because the data is wrong, incomplete, formatted incorrectly, or is a duplicate copy of another entry. Many data-intensive fields of business such as banking, insurance, retail, transportation, and telecommunications may use these sophisticated software applications to clean up a database’s information.

4. Data aggregation:

Is a process of calculating the summaries from the detailed data

  • Data scrubbing is applied for each record
  • Data aggregation is applied for a group of records

Data Loading:

Dumping/inserting of the processed data into the target system (DWH)

Two types of data loads:

A) Initial/Full load: is a process of loading the data into an empty target table

B) Incremental/Delta load: is a process of loading only new records/changed records, which takes place after initial load

  • First, we load dimensions
  • Next, we load facts
Explore Informatica Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!

Subscribe For Free Demo

Free Demo for Corporate & Online Trainings.

Ravindra Savaram
About The Author

Ravindra Savaram is a Content Lead at His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter. Protection Status