SAP HANA Introduction and Architectural Overview
Every industry has a certain set of “rules” that govern the way the companies in that industry operate. The rules might be adjusted from time to time as the industry matures, but the general rules stay basically the same — unless some massive disruption occurs that changes the rules or even the entire game. SAP HANA is one of those massively disruptive innovations for the enterprise IT industry.
To understand this point, consider that you’re probably reading this book on an e-reader, which is a massively disruptive innovation in the positively ancient publishing industry. The book industry has operated under the same basic rules since Gutenberg mechanized the production of books in 1440. There were a few subsequent innovations within the industry, primarily in the distribution chain, but the basic processes of writing a book, printing it, and reading it remained largely unchanged for several hundred years. That is —until Amazon and Apple came along and digitized the production, distribution, and consumption of books. These companies are also starting to revolutionize the writing of books by providing new authoring tools that make the entire process digital and paper-free. This technology represents an overwhelming assault of disruptive innovation on a 500+ year-old industry in less than 5 years.
Today, SAP HANA is disrupting the technology industry in much the same way that Amazon and Apple have disrupted the publishing industry. Before we discuss how this happens, we need to consider a few fundamental rules of that industry.
The IT Industry: A History of Technology Constraints
Throughout the history of the IT industry, the capabilities of applications have always been constrained to a great degree by the capabilities of the hardware that they were designed to run on. This explains the “leapfrogging” behavior of software and hardware products, where a more capable version of an application is released shortly after a newer, more capable generation of hardware — processors, storage, memory, and so on — is released. For example, each version of Adobe Photoshop was designed to maximize the most current hardware resources available to achieve the optimal performance. Rendering a large image in Photoshop 10 years ago could take several hours on the most powerful PC. In contrast, the latest version, when run on current hardware, can perform the same task in just a couple of seconds, even on a low-end PC.
Enterprise software has operated on a very similar model. In the early days of mainframe systems, all of the software — specific, the applications, operating system, and database — was designed to maximize the hardware resources located inside the mainframe as a contained system. The transactional data from the application and the data used for reporting were physically stored in the same system. Consequently, you could either process transactions or process reports, but you couldn’t do both at the same time or you’d kill the system. Basically, the application could use whatever processing power was in the mainframe, and that was it. If you wanted more power, you had to buy a bigger mainframe.
The Database Problem: Bottlenecks
When SAP R/3 came out in 1992, it was designed to take advantage of a new hardware architecture — client-server — where the application could be run on multiple, relatively cheap application servers connected to a larger central database server. The major advantage of this architecture was that, as more users performed more activities on the system, you could just add a few additional application servers to scale out application performance. Unfortunately, the system still had a single database server, so transmitting data from that server to all the application servers and back again created a huge performance bottleneck.
Eventually, the ever-increasing requests for data from so many application servers began to crush even the largest database servers. The problem wasn’t that the servers lacked sufficient processing power. Rather, the requests from the application servers got stuck in the same input/output (IO) bottleneck trying to get data in and out of the database. To address this problem, SAP engineered quite a few “innovative techniques” in their applications to minimize the number of times applications needed to access the database. Despite these innovations, however, each additional database operation continued to slow down the entire system.
This bottleneck was even more pronounced when it came to reporting data. The transactional data — known as online transaction processing, or OLTP — from documents such as purchase orders and production orders were stored in multiple locations within the database. The application would read a small quantity of data when the purchasing screen was started up, the user would input more data, the app would read a bit more data from the database, and so on, until the transaction was completed and the record was updated for the last time. Each transactional record by itself doesn’t contain very much data. When you have to run a report across every transaction in a process for several months, however, you start dealing with huge amounts of data that have to be pulled through a very slow “pipe” from the database to the application.
To create reports, the system must read multiple tables in the database all at once and then sort the data into reports. This process requires the system to pull a massive amount of data from the database, which essentially prevents users from doing anything else in the system while it’s generating the report. To resolve this problem, companies began to build separate OLAP systems such as SAP Business Warehouse to copy the transaction data over to a separate server and oﬄoad all that reporting activity onto a dedicated “reporting” system. This arrangement would free up resources for the transactional system to focus on processing transactions.
Unfortunately, even though the servers were getting faster and more powerful (and cheaper), the bottleneck associated with obtaining data from the disk wasn’t getting better; in fact, it was actually getting worse. As more processes in the company were being automated in the transactional system, it was producing more and more data, which would then get dumped into the reporting system. Because the reporting system contained more, broader data about the company’s operations, more people wanted to use the data, which in turn generated more requests for reports from the database under the reporting system. Of course, as the number of requests increased, the quantities of data that had to be pulled correspondingly increased. You can see how this vicious (or virtuous) cycle can spin out of control quickly.
The Solution: In-Memory Architecture
This is the reality that SAP was seeing at their customers at the beginning of the 2000’s. SAP R/3 had been hugely successful, and customers were generating dramatically increasing quantities of data. SAP had also just released SAP NetWeaver, which added extensive internet and integration capabilities to its applications. SAP NetWeaver added many new users and disparate systems that talked to the applications in the SAP landscape. Again, the greater the number of users, the greater the number of application servers that flooded the database with requests. Similarly, as the amount of operational data in the SAP Business Warehouse database increased exponentially, so did the number of requests for reports. Looking forward, SAP could see this trend becoming even more widespread and the bottleneck of the database slowing things down more and more. SAP was concerned that customers who had invested massive amounts of time and money into acquiring and implementing these systems to make their businesses more productive and profitable would be unable to get maximum value from them.
Fast forward a few years, and now the acquisitions of Business Objects and Sybase were generating another exponential increase in demands for data from both the transactional and analytic databases from increasing numbers of analytics users and mobile users. Both the volume of data and the volume of users requesting data were now growing thousands of times faster than the improvements in database I/O.
Having become aware of this issue, in 2004 SAP initiated several projects to innovate the core architecture of their applications to eliminate this performance bottleneck. The objective was to enable their customers to leverage the full capabilities of their investment in SAP while avoiding the data latency issues. The timing couldn’t have been better. It was around this time that two other key factors were becoming more significant:
(1) internet use and the proliferation of data from outside the enterprise, and
(2) the regulatory pressures on corporations, generated by laws such as Sarbanes-Oxley, to be answerable for all of their financial transactions.
These requirements have increased the pressure on already stressed systems to analyze more data more quickly. The SAP projects resulted in the delivery of SAP HANA in 2011, the first step in the transition to a new in-memory architecture for enterprise applications and databases. SAP HANA flips the old model on its head and converts the database from the “boat anchor” that slows everything down into a “jet engine” that speeds up every aspect of the company’s operations.
SAP HANA Architectural Overview
SAP HANA was, initially, developed in Java and C++ and designed to run only Operating System Suse Linux Enterprise Server 11. The SAP HANA system consists of multiple components that are responsible to emphasize the computing power of HANA system.
HANA system contains Name Server, Preprocessor Server, Statistics Server and XS engine, which is used to communicate and host small web applications and various other components.
Now that we’ve discussed the key concepts underlying in-memory storage, we can focus more speciﬁcally on the SAP HANA architecture. As we noted earlier, conceptually SAP HANA is very similar to most databases you’re familiar with. Applications have to put data in and take data out of the database, data sources have to interface with it, and it has to store and manage data reliably. Despite these surface similarities, however, SAP HANA is quite diﬀerent “under the hood” than any database in the market. In fact, SAP HANA is much more than just a database. It includes many tools and capabilities “in the box” that make it much more valuable and versatile than a regular database. In reality, it’s a full-featured database platform.
In what ways is SAP HANA unique? First, it is delivered as a pre-conﬁgured, pre-installed appliance on certiﬁed hardware. This eliminates many of the typical activities and problems you ﬁnd in regular databases. Second, it includes all of the standard application interfaces and libraries so that the developers can immediately get to work using it, without re-learning any proprietary APIs.
SAP HANA in-memory appliance
Finally, SAP HANA comes with several ways to connect easily to nearly any source system in either real-time or near to real-time.
These features are designed to make SAP HANA as close to “plug-and-play” as it can be and to make it a non-disruptive addition to your existing landscape. We’ll spend a few moments here explaining these capabilities at a basic level.