SAP’s Big Data Ecosystem Overview
Like most things in the real world, the big data landscape is complex. Competitive advantage doesn’t come from having one tool, but from having the right toolset to support business needs.
SAP provides an integrated set of data management solutions for big data: HANA for real-time analytics on operational and transactional data, SAP IQ for petabyte scale storage and analytics of less time critical data, and Hadoop as a massive data lake where data can be stored and explored. SAP does not market its own Hadoop distribution, but provides an open platform to work with a variety of Hadoop distributions. Lastly, SAP provides a suite of Information Management solutions to integrate systems, ensure data quality, and manage the overall data landscape.
The heart of SAP’s big data platform is SAP HANA. At its core SAP HANA provides an in-memory, columnar, distributed database architecture designed to handle massive datasets. Since the SAP HANA database resides entirely in-memory all the time, additional complex calculations, functions, and data-intensive operations can happen on the data directly in the database, without requiring time-consuming and costly aggregations.
SAP IQ was the ﬁrst commercial column store, which is designed to scale to petabyte scale database size. While it is not in-memory like SAP HANA, it has excellent performance characteristics with a rich SQL layer, patented indexing, and a disk-backed store. SAP IQ and SAP HANA are integrated to work well together through smart data access, which allows remote tables to be queried as though they were local tables. This provides real-time analytics along with data scalability. Smart data access in eﬀect creates a logical data warehouse.
In essence, SAP HANA smart data access enables the creation of a logical data warehouse, where data in HANA, IQ, and Hadoop can be mapped at a higher level, freeing the analyst from understanding exactly where in the landscape, the data resides. This solution ampliﬁes the value of big data across your data fabric by enabling working with data sets stored in a variety of places including Hadoop.
SAP HANA can access data in other data sources such as Hadoop to extend the reach of its processing power. Hadoop provides vast and ﬂexible storage for data objects, independent of their structure and size. Hadoop is perfectly positioned to store the very large data sets that are too big to ﬁt into memory and that require a preprocessing step before they can be easily analyzed. By connecting SAP HANA to Hadoop, you can run jobs in Hadoop that load information into HANA and then provide super-fast ﬁnal analysis.
When you put SAP HANA, SAP IQ, and Hadoop together, you have three data processing domains with diﬀerent strengths that combine to form a big data processing backbone. Together, these three components provide real-time capabilities along with extreme scale. Data can be processed with the appropriate technology depending on its characteristics — hot data in HANA, warm data in IQ, and a vast data lake in Hadoop — where data can be stored, processed, and aggregated without constraints on size, format, or cleanliness.
SAP Data Services is a sophisticated ETL and text processing tool, and ESP that can capture streaming sources of machine generated data. SAP BW is a rich data warehouse layer on top of SAP HANA. SAP BusinessObjects BI universes can pull Hadoop and database sources together to serve up information to business applications. All of this technology works together to bring big data into the enterprise (see Figure 3).
SAP® Event Stream Processor (SAP ESP) is a mature and high throughput complex event processing engine that allows for integration of real-time data streams into the big data environment. It is a key tool for building real-time applications that help to formulate a response to real-time data.
Figure 3 : SAP HANA architecture in the context of big data.
The World’s Biggest, Big Data Database
In early 2014, SAP generated a new world record for the world’s largest data warehouse using the SAP HANA® platform and SAP® IQ software. This independently audited 12.1PB data warehouse has been recognized by Guinness World Records, and is four times larger than the prior record.
This new world record demonstrates the ability of SAP HANA and SAP IQ to eﬃciently handle extreme-scale enterprise data warehouse and Big Data analytics. SAP and its partners had previously set a world record for loading and indexing Big Data at 34.3 Terabytes per hour.
SAP’s big data technology simpliﬁes the IT landscape. SAP HANA provides speed for dealing with big data in real-time. It can also speed up traditional enterprise data warehouse applications, putting them on steroids. The complementary nature of SAP HANA, SAP IQ, and Hadoop supports every big data use case, whether it’s driven by BI analysts, data scientists, or IT seeking to help big data inform the real-time enterprise.