Data Acquisition – ETL Process In Data Warehousing

The process of data acquisition includes obtaining pertinent business information, translating it into the needed business format, and feeding it into the target system. A data acquisition process involves the extraction, transformation, and loading of data. We have discussed the ETL procedure in data warehousing in this blog.

Rating: 5
  
 
8165

Data acquisition ETL Process in Data Warehousing

Data Acquisition:

In DWH terminology, Extraction, Transformation, Loading (ETL) is called as Data Acquisition

It is a process of extracting relevant business information from multiple operational source systems, transforming the data into a homogenous format and loading into the DWH/Datamart.

Two types of ETL’s used in implementing data acquisition.

1) A) Code  Based  ETL: An ETL application can be developed using programming languages such as PL/SQL

Ex: Teradata ETL Utilities

2) SAS based & SAS access

B) GUI Based ETL: An application can be designed with a simple GUI interface, point & click techniques

Ex: Informatica, Datastage, Data junction, Abintio, Data services, Data manager, Oracle data integrator (OWB), SSIS (SQL Server Integration Service)

MindMajix Youtube Channel

Data Extraction:

It is the process of reading the data from the following operational source systems (External to the organization and Internal databases)

  • Relational Systems—-Oracle, SQL Server, DB2, Sybase, Informix, Redbrick
  • File sources: XML, COBOL, Flat files, weblog
  • Legacy systems: Mainframes, VSAM
  • Other sources: MS-Access, MS-Excel, MQ Series, TIBCO
  • ERP Sources: SAP R/3, Peoplesoft, JDEdwards, BAAN, RAMCO systems
Enhance your IT skills and proficiency in Data Warehousing by taking up the  Informatica Training.

Data Transformation:

Transforming the data from one format to the client required format/ business standard format. A staging is a temporary memory or buffer where the following data transformation activities take place

  1. Data merging
  2. Data Cleansing
  3. Data scrubbing
  4. Data aggregation

1. Data Merging:

It’s a process of integrating the data from multiple data sources into a single operational data set Data Sources can be homogenous/heterogeneous

Two types of merging:

A) Horizontal Merging: Process of joining the records horizontally when the 2 sources are having different data definitions.

Ex: Source1—Emp Table

columns –Empno, Ename, Sal, Deptno

Source2—-Dept Table

columns in the Dept Table– Dept no, Dname, Loc

B) Vertical Merging: Process of joining the records vertically when the 2 sources are having similar data structures

Frequently Asked Informatica Interview Questions

2. Data cleansing:

Data cleansing is the process of ensuring that a set of data is correct and accurate. During this process, records are checked for accuracy and consistency, and they are either corrected or deleted as necessary. This can occur within a single set of records or between multiple sets of data that need to be merged or that will work together

3. Data scrubbing:R

Data scrubbing is the process of detecting and removing or correcting any information in a darticle;tabase that has some sort of error. This error can be because the data is wrong, incomplete, formatted incorrectly, or is a duplicate copy of another entry. Many data-intensive fields of business such as banking, insurance, retail, transportation, and telecommunications may use these sophisticated software applications to clean up a database’s information.

[Related Article:Trending Data Warehousing Tools]

 

4. Data aggregation:

Is a process of calculating the summaries from the detailed data

  • Data scrubbing is applied for each record
  • Data aggregation is applied for a group of records

Data Loading:

Dumping/inserting of the processed data into the target system (DWH)

Two types of data loads:

A) Initial/Full load: is a process of loading the data into an empty target table

B) Incremental/Delta load: is a process of loading only new records/changed records, which takes place after initial load

  • First, we load dimensions
  • Next, we load facts
Explore Informatica Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!

 

 

Join our newsletter
inbox

Stay updated with our newsletter, packed with Tutorials, Interview Questions, How-to's, Tips & Tricks, Latest Trends & Updates, and more ➤ Straight to your inbox!

Course Schedule
NameDates
Snowflake TrainingApr 20 to May 05View Details
Snowflake TrainingApr 23 to May 08View Details
Snowflake TrainingApr 27 to May 12View Details
Snowflake TrainingApr 30 to May 15View Details
Last updated: 03 Apr 2023
About Author

I am Ruchitha, working as a content writer for MindMajix technologies. My writings focus on the latest technical software, tutorials, and innovations. I am also into research about AI and Neuromarketing. I am a media post-graduate from BCU – Birmingham, UK. Before, my writings focused on business articles on digital marketing and social media. You can connect with me on LinkedIn.

read more