Data Warehousing Concepts in SSIS

Microsoft SQL Server's SQL Server Integration Service (SSIS) may be used to perform a broad variety of data migration activities. As a tool for extracting and transforming data, SSIS may be used for a wide range of tasks including data cleansing and aggregation as well as merge and fusion. SQL Server Integration Services are explained in detail in this blog, both at a basic and intermediate level.

Data Warehousing Concepts

DATA:  data is composed of observable and recordable facts that are often found in operational for transactions systems.

OLTP:  OLTP is nothing but an observation of online transaction processing. The system is an applicable application that modifies data the instance it receives and has a large number of concurrent users.

OLAP: OLAP is an abbreviation of online analytical processing this system is an application that collects manager processes and presents multidimensional data for analysis and management purpose.

DATA MINING: data mining is the process of analyzing data from different perspectives and summarizing it into useful information.

BI:  BI is the leveraging of BW to help make business decisions and recommendations. Information and data rules engines are leveraged here to help make these decisions along with statistical analysis tools and data mining tools.

DWH DEFINITIONS:  Bill Inmon, considered to be the father of data warehousing provides the following definition

“A data warehouse is a Subject-oriented, Integrated, Non-volatile, And time-variant.

Collection of data in support of management’s decision seam Kelly, another leading data warehousing Practitioner defines the data warehouse in the following way.

The data in the data warehouse is,

                               “Separate

                                Available

                                Integrated

                                Time-stamped

                                Subject-oriented

                                Non-volatile

                                Accessible “

Subject-oriented

In the data warehouse, data is not stored by operational applications, but by business subjects.

Want to become a master in Cloud Data Warehouse - then enroll in our Snowflake Certification Course

MindMajix Youtube Channel

Operational applications data warehouse subject

Operational Applications

        Operational Applications

Data warehouse Subjects

 Data warehouse Subjects

Integrated

Data inconsistencies are removed; data from diversified operational applications is integrated

Integrated

Here are some of the items that would need standardization:

  • Naming conventions
  • Codes
  • Data attributing
  • Measurements

Time-variant data:

The time-variant nature of the data in a data warehouse

  • Allows for analysis of the past
  • Relates information to the presents
  • Enables forecasts for the future

Frequently Asked MSBI Interview Questions & Answers

 Non-volatile data:

 Usually, the data in the data warehous

 

 

e is not updated for deleted,

Non-volatile data

Data granularity:

Data granularity in a data warehouse refers to the level of detail data. The lower level details, the finer the data granularity.

  • Depending on the requirements multiple levels of details may be present. Many data warehouses have at least dual levels of granularity.

Three data levels in a banking data warehouse

Daily detailmonthly summeryquarterly summery
AccountAccountAccount
Available dateMonthMonth
AmountNumber of transactionsNumber of transactions
Deposit/withdrawalWith drawlsDepositsBeginning balance
Ending balance
With drawlsDepositsBeginning balance
Ending balance

Data warehouse approaches

RALPH KIMBAL
(Bottom-up approach)W.H 2 NMON

(Top down approach) Here data marts designed first, later from data marts for warehouse Enterprise DWH concentrated first, next data marts

 

[Related Article: Top Data Warehousing Tools]

 

Top-down approach:

The advantages of this approach are:

  • A truly corporate effort, an enterprise view of data
  • Inherently architected – not a union of the separate data marts
  • Single, central storage of data about the content
  • Centralized rules and control
  • May sea quick results if implemented with iterations

The disadvantages are:

  • Takes longer to build even with an iterative method
  • High exposure risk to failure
  • Needs a high level of cross – fundamental skills
  • High outlay without proof of concept.

Bottom-up approach:

The advantages of these approaches are:

  • Faster and easier implementation of manageable pieces
  • Favorable returns on investment and proof of concept
  • Less risk of failure
  • Inherently, incremental: can schedule important data marts first
  • Allows the project team to learn and grow

The disadvantages are:

  • Each data mart has its own narrow view of data
  • Permits redundant data in every data mart
  • Perpetuates inconsistent and irreconcilable data
  • Proliferates, unmanageable interfaces.

DWH Life Cycle

 DWH Life Cycle

Difference between data warehouse and OLTP

S.noOLTPDWH
1Designed for transaction processingDesigned for decision support
2VolatileStore non-volatile
3Store current data onlyStore historical data
4Store detail dataStore summarized data
5More joinsLess joins
6Normalized (more tables )De normalized (less tables )
7Less indexesMore indexes
8Low-level managers accessHigh-level manager access
9Read, update, insert, deleteRead
10Access frequency high and in secLow, sec to minutes

Types of OLAP

ROLAP: (Relational OLAP):

Applied on relational sources both data and aggregate information store under relational sources.

Ex: BO, congo’s crystal reports, micro strategy

MOLAP: (multi dimensional OLAP):

Here, analysis will be done on multi dimensional application here, data and aggregate information store under multi dimensional sources.

Eg: congo’s SSAS, hyperion, Micro-strategy etc…………

HOLAP: (hybrid OLAP)

Here data store in relational sources and aggregated values store under multi dimensional sources       “cubes”

Eg: cognos, BO, Micro strategic……………

ETL

a) Code based.

Development cost, testing, maintenance

Eg:    sas base, sas access, tera data utilities, sql, plsql.

b)    GUI based etl- informatica, data stage, abinitio etc …..

Data Extraction

a) relational sources: ORACLE, SQL SERVER, TD, DB2, INFORMIX, SY BASE, RED BRICK

b) ERP sources: SAP R/3, DEOPLESOFT, J.D. EDWARDS, BANN, RAMCO MARSHALL.

c) Main frames files: COBOL FILES, IMS FILES, JCL FILES, DB2, JCL

d) File sources: FLAT FILES (TEXT FILES), XML FILES

e) Other sources: WEB LOGIC FILES, TIBCO M2 SERVICES, EXCEL SHEET, and PDF ETC……………….

 
Explore MSBI Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!

List of Related Microsoft Certification Courses:

 SSRS Power BI
 SSAS SQL Server
 SCCM SQL Server DBA
 SharePoint BizTalk Server
 Team Foundation Server BizTalk Server Administrator

 

Course Schedule
NameDates
MSBI TrainingOct 15 to Oct 30View Details
MSBI TrainingOct 19 to Nov 03View Details
MSBI TrainingOct 22 to Nov 06View Details
MSBI TrainingOct 26 to Nov 10View Details
Last updated: 03 Apr 2023
About Author

Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.

read less
  1. Share:
MSBI Articles