In DWH terminology, Extraction, Transformation, Loading (ETL) is called as Data Acquisition
It is a process of extracting relevant business information from multiple operational source systems, transforming the data into a homogenous format and loading into the DWH/Datamart.
Two types of ETL’s used in implementing data acquisition.
1) A) Code Based ETL: An ETL application can be developed using programming languages such as PL/SQL
Ex: Teradata ETL Utilities
2) SAS based & SAS access
B) GUI Based ETL: An application can be designed with a simple GUI interface, point & click techniques
Ex: Informatica, Datastage, Data junction, Abintio, Data services, Data manager, Oracle data integrator (OWB), SSIS (SQL Server Integration Service)
It is the process of reading the data from the following operational source systems (External to the organization and Internal databases)
Transforming the data from one format to the client required format/ business standard format. A staging is a temporary memory or buffer where the following data transformation activities take place
It’s a process of integrating the data from multiple data sources into a single operational data set Data Sources can be homogenous/heterogeneous
Two types of merging:
A) Horizontal Merging: Process of joining the records horizontally when the 2 sources are having different data definitions.
Ex: Source1—Emp Table
columns –Empno, Ename, Sal, Deptno
columns in the Dept Table– Dept no, Dname, Loc
B) Vertical Merging: Process of joining the records vertically when the 2 sources are having similar data structures
Data cleansing is the process of ensuring that a set of data is correct and accurate. During this process, records are checked for accuracy and consistency, and they are either corrected or deleted as necessary. This can occur within a single set of records or between multiple sets of data that need to be merged or that will work together
Data scrubbing is the process of detecting and removing or correcting any information in a darticle;tabase that has some sort of error. This error can be because the data is wrong, incomplete, formatted incorrectly, or is a duplicate copy of another entry. Many data-intensive fields of business such as banking, insurance, retail, transportation, and telecommunications may use these sophisticated software applications to clean up a database’s information.
[Related Article:Trending Data Warehousing Tools]
Dumping/inserting of the processed data into the target system (DWH)
Two types of data loads:
A) Initial/Full load: is a process of loading the data into an empty target table
B) Incremental/Delta load: is a process of loading only new records/changed records, which takes place after initial load