Data Acquisition:
In DWH terminology, Extraction, Transformation, Loading (ETL) is called as Data Acquisition
It is a process of extracting relevant business information from multiple operational source systems, transforming the data into a homogenous format and loading into the DWH/Datamart.
Two types of ETL’s used in implementing data acquisition.
1) A) Code Based ETL: An ETL application can be developed using programming languages such as PL/SQL
Ex: Teradata ETL Utilities
2) SAS based & SAS access
B) GUI Based ETL: An application can be designed with a simple GUI interface, point & click techniques
Ex: Informatica, Datastage, Data junction, Abintio, Data services, Data manager, Oracle data integrator (OWB), SSIS (SQL Server Integration Service)
Data Extraction:
Subscribe to our youtube channel to get new updates..!
It is the process of reading the data from the following operational source systems (External to the organization and Internal databases)
- Relational Systems—-Oracle, SQL Server, DB2, Sybase, Informix, Redbrick
- File sources: XML, COBOL, Flat files, weblog
- Legacy systems: Mainframes, VSAM
- Other sources: MS-Access, MS-Excel, MQ Series, TIBCO
- ERP Sources: SAP R/3, Peoplesoft, JDEdwards, BAAN, RAMCO systems
Data Transformation:
Transforming the data from one format to the client required format/ business standard format. A staging is a temporary memory or buffer where the following data transformation activities take place
- Data merging
- Data Cleansing
- Data scrubbing
- Data aggregation
1. Data Merging:
It’s a process of integrating the data from multiple data sources into a single operational data set Data Sources can be homogenous/heterogeneous
Two types of merging:
A) Horizontal Merging: Process of joining the records horizontally when the 2 sources are having different data definitions.
Ex: Source1—Emp Table
columns –Empno, Ename, Sal, Deptno
Source2—-Dept Table
columns in the Dept Table– Dept no, Dname, Loc
B) Vertical Merging: Process of joining the records vertically when the 2 sources are having similar data structures
Frequently Asked Informatica Interview Questions
2. Data cleansing:
Data cleansing is the process of ensuring that a set of data is correct and accurate. During this process, records are checked for accuracy and consistency, and they are either corrected or deleted as necessary. This can occur within a single set of records or between multiple sets of data that need to be merged or that will work together
3. Data scrubbing:R
Data scrubbing is the process of detecting and removing or correcting any information in a darticle;tabase that has some sort of error. This error can be because the data is wrong, incomplete, formatted incorrectly, or is a duplicate copy of another entry. Many data-intensive fields of business such as banking, insurance, retail, transportation, and telecommunications may use these sophisticated software applications to clean up a database’s information.
[Related Article:Trending Data Warehousing Tools]
4. Data aggregation:
Is a process of calculating the summaries from the detailed data
- Data scrubbing is applied for each record
- Data aggregation is applied for a group of records
Data Loading:
Dumping/inserting of the processed data into the target system (DWH)
Two types of data loads:
A) Initial/Full load: is a process of loading the data into an empty target table
B) Incremental/Delta load: is a process of loading only new records/changed records, which takes place after initial load
- First, we load dimensions
- Next, we load facts