Thus far, we’ve focused on the transition to in-memory computing and its implications for IT. With this information as background, we next “dive into the deep end” of SAP HANA. Before we do so, however, here are a few basic concepts about in-memory computing that you’ll need to understand. Some of these concepts might be similar to what you already know about databases and server technology. There are also some cutting-edge concepts, however, that merit discussion.
SAP HANA is an IN-MEMORY database:
An in-memory database means all the data is stored in the memory (RAM). There is no time wasted in loading the data from hard-disk to RAM or while processing, keeping some data in RAM and temporary some data on disk. Everything is in-memory all the time, which gives the CPUs quick access to data for processing.
The speed advantages offered by this RAM storage system are further accelerated by the use of multi-core CPUs, and multiple CPUs per board, and multiple boards per server appliance.
Storing data in memory isn’t a new concept. What is new is that now you can store your whole operational and analytic data entirely in RAM as the primary persistence layer5. Historically, database systems were designed to perform well on computer systems with limited RAM. As we have seen, in these systems slow disk I/O was the main bottleneck in data throughput. Today, multi-core CPUs — multiple CPUs located on one chip or in one package — are standard, with a fast communication between processor cores enabling parallel processing. Currently server processors have up to 64 cores, and 128 cores will soon be available. With the increasing number of cores, CPUs are able to process increased data volumes in parallel. Main memory is no longer a limited resource. In fact, modern servers can have 2TB of system memory, which allows them to hold complete databases in RAM. Significantly, this arrangement shifts the performance bottleneck from disk I/O to the data transfer between CPU cache and main memory (which is already blazing fast and getting faster).
In a disk-based database architecture, there are several levels of caching and temporary storage to keep data closer to the application and avoid excessive numbers of round-trips to the database (which slows things down). The key difference with SAP HANA is that all of those caches and layers are eliminated because the entire physical database is literally sitting on the motherboard and is therefore in memory all the time. This arrangement dramatically simplifies the architecture.
It is important to note that there are quite a few technical differences between a database that was designed to be stored on a disk versus one that was built to be entirely resident in memory. There’s a techie book6 on all those conceptual differences if you really want to get down into the details. What follows here is a brief summary of some of the key advantages of SAP HANA over its aging disk-based cousins.
With SAP HANA, all relevant data is available in main memory, which avoids the performance penalty of disk I/O completely. Either disk or solid-state drives are still required for permanent persistency in the event of a power failure or some other catastrophe. This doesn’t slow down performance, however, because the required backup operations to disk can take place asynchronously as a background task.
Parallel processing is a “divide and conquer” strategy to increase business process throughput and cut processing time by engaging more system resources.
Multiple CPUs can now process parallel requests in order to fully utilize the available computing resources. So, not only is there a bigger “pipe” between the processor and database, but this pipe can send a good of data to hundreds of processors at the same time so that they can crunch more data without waiting for anything.
Conceptually, a database table is a two-dimensional data structure with cells organized in rows and columns, just like a Microsoft Excel spreadsheet. Computer memory, in contrast, is organized as a linear structure. To store a table in linear memory, two options exist: row-based storage and column storage.
Row Storage – The data sequence consists of the data fields in one table row. A row-oriented storage system stores a table as a sequence of records, each of which contains the fields of one row. Relational databases typically use row-based data storage.
Column Storage – The data sequence consists of the entries in one table column. Conversely, in column storage the entries of a column are stored in contiguous memory locations. Column-based storage is more suitable for many business applications.
SAP HANA is a “hybrid” database that uses both methods simultaneously to provide an optimal balance between them. SAP HANA supports both row-based and column-based storage, and is particularly optimized for column-based storage.
The SAP HANA database allows the application developer to specify whether a table is to be stored column-wise or row-wise. It also enables the developer to alter an existing table from columnar to row-based and vice versa. The decision to use columnar or row-based tables is typically determined by how the data will be used and which method is the most efficient for that type of usage.
Column-based tables have advantages in the following circumstances:
Calculations are typically executed on a single column or a few columns only.
The table is searched based on values of a few columns. The table has a large number of columns.
The table has a large number of rows, so that columnar operations are required (aggregate, scan, etc.).
High compression rates can be achieved because the majority of the columns contain only a few distinct values (compared to the number of rows).
Row-based tables have advantages in the following circumstances:
The application only needs to process a single record at one time. (This applies to many selects and/or updates of single records.)
The application typically needs to access a complete record (or row). The columns contain primarily distinct values so that the compression rate would be low.
Neither aggregations nor fast searching is required.
The table has a small number of rows (e. g., configuration tables).
Compression defines the process of reducing the amount of storage needed to represent a certain set of information.
The column store allows for the efficient compression of data. This makes it less costly for the SAP HANA database to keep data in main memory. It also speeds up searches and calculations.
Because of the innovations in hybrid row/column storage in SAP HANA, companies can typically achieve between 5x and 10x compression ratios on the raw data. This means that 5TB of raw data can optimally fit onto an SAP HANA server that has 1TB of RAM. SAP typically recommends that companies double the estimated compressed table data to determine the amount of RAM needed in order to account for real-time calculations, swap space, OS and other associated programs beyond just the raw table data.
Compression is automatically calculated and optimized as part of the delta merge operation. If you create an empty column table, no compression is applied initially as the database cannot know which method is most appropriate. As you start to insert data into the table and the delta merge operation starts being executed at regular intervals, data compression is automatically (re)evaluated and optimized.
The SAP HANA database persistence layer stores data in persistent disk volumes (either hard disk or solid-state drives). The persistence layer ensures that changes are durable and that the database can be restored to the most recent committed state after a restart. SAP HANA uses an advanced delta-insert approach for rapid backup and logging. If power is lost, the data in RAM is lost. However, because the persistence layer manages to restore points and backup at such high speeds (from RAM to SSD) and recovery from disk to RAM is so much faster than from the regular disk, you actually “lose” less data and recover much faster than in a traditional disk-based architecture.
SAP has a surprisingly long history of developing in-memory technologies to accelerate its applications. Because disk I/O has been a performance bottleneck since the beginning of three-tier architecture, SAP has constantly searched for ways to avoid or minimize the performance penalty that customers pay when they pull large data sets from disk. So, SAP’s initial in-memory technologies were used for very specific applications that contained complex algorithms that needed a great deal of readily accessible data.
When SAP introduced Advanced Planning Optimizer (APO) as part of its supply chain management application in the late 1990s, the logistics planning algorithms required a significant speed boost to overcome the disk I/O bottleneck. These algorithms — some of them are the most complex that SAP has ever written — needed to crunch massive amounts of product, production, and logistics data to produce an optimal supply chain plan. SAP solved this problem in 1999 by taking some of the capabilities of its open-source database, SAP MaxDB (called SAP DB at the time), and built them into a memory-resident cache system called SAP LiveCache. Basically, LiveCache keeps a persistent copy of all of the relevant application logic and master data needed in memory, thus eliminating the need to make multiple trips back and forth to the disk. LiveCache worked extremely well; in fact, it processed data 600 times faster than disk-based I/O. Within its narrow focus, it clearly demonstrated that in-memory caching could solve a major latency issue for SAP customers.
In 2003, a team in SAP’s headquarters in Waldorf, Germany, began to productize a specialized search engine for SAP systems called TREX (Text Retrieval and information EXtraction). TREX approached enterprise data in much the same way that Google approaches internet data. That is, TREX scans the tables in a database and then creates an index of the information contained in the table. Because the index is a tiny fraction of the size of the actual data, the TREX team came up with the idea of putting the entire index in the RAM memory of the server to speed up searches of the index. When this technology became operational, their bosses asked them to apply the same technique to a much more imposing problem: the data from a SAP BW cube. Thus, Project Euclid was born.
At that time, many of the larger SAP BW customers were having significant performance issues with reports that were running on large data cubes. Cubes are the basic mechanism by which SAP BW stores data in multidimensional structures. Running reports on very large cubes (>100GB) was taking several hours, sometimes even days. The SAP BW team had done just about everything possible in the SAP BW application to increase performance, but had run out of options in the application layer. The only remaining solution was to eliminate the bottleneck itself. In the best spirit of disruptive innovators, the TREX team devised a strategy to eliminate the database from the equation entirely by indexing the cubes and storing the indexes in high-speed RAM.
Initial results for Euclid were mind-blowing: The new technology could execute query responses for the same reports on the same data thousands of times faster than the old system. Eventually, the team discovered how to package Euclid into a stand-alone server that would sit next to the existing SAP BW system and act as a non-disruptive “turbocharger” for a customer’s slow SAP BW reports. At the same time, SAP held some senior-level meetings with Intel to formulate a joint-engineering project to optimize Intel’s new dual-core chips to natively process the SAP operations in parallel, thereby increasing performance exponentially. Intel immediately sent a team to SAP headquarters to begin the optimization work. Since that time the two companies have continuously worked together to optimize every successive generation of chips.
In 2005, SAP launched the product SAP NetWeaver Business Intelligence Accelerator, or BIA. (The company subsequently changed the name to SAP NetWeaver Business Warehouse Accelerator, or BWA). BWA has since evolved into one of SAP’s best-selling products, with one of the highest customer satisfaction ratings. BWA solved a huge pain point for SAP customers. Even more importantly, however, it represented another successful use of in- memory. Along with LiveCache, the success of BWA proved to SAP and its customers that in-memory data processing just might be an architectural solution to database bottlenecks.
Once the results for BWA and LiveCache began to attract attention, SAP decided to take the next big step and determine whether it could run an entire database for an SAP system in memory. As we’ll see later, this undertaking is a lot more complicated than it sounds. Using memory as a cache to temporarily store data or storing indexes of data in memory were key innovations, but eliminating the disk completely from the architecture takes the concept to an entirely different level of complexity and introduces a great deal of unknown technical issues into the landscape.
Therefore, in 2005, SAP decided to build a skunkworks project to validate and test the idea. The result was the Tracker Project. Because the new SAP database was in an early experimental stage and the final product could seriously disrupt the market, the Tracker Project was strictly “Top Secret,” even to SAP employees.
The Tracker team was composed of the TREX/BWA engineers, a few of the key architects from the SAP MaxDB open-source database team, the key engineers who built LiveCache, the SAP ERP performance optimization and benchmarking gurus, and several database experts from outside the company. Basically, the team was an all-star lineup of everyone inside and outside SAP who could contribute to this “big hairy audacious goal” of building the first in-memory database prototype for SAP (the direct ancestor of SAP HANA).
In the mid-1990s, several researchers at Stanford University had performed the first experiments to build an in-memory database for a project at HP Labs. Two of the Stanford researchers went on to find companies to commercialize their research. One product was a database query optimization tool known as Callixa, and the other was a native in-memory database called P*Time. In late 2005, SAP quietly acquired Callixa and P*time (as well as a couple of other specialist database companies), hired several of the most distinguished database geniuses on the planet, and put them to work with the Tracker team. The team completed the porting and verification of the in- memory database on a server with 64gb of RAM, which was the maximum supported memory at that time.
In early 2006, less than four months after the start of the project, the Tracker team passed its primary performance and “reality check” goal: the SAP Standard Application Benchmark for 1000 user SD two-tier benchmark with more than 6000 SAPs, which essentially matched the performance of the two leading certified databases at the time. To put that in perspective, it took Microsoft several years of engineering to port Microsoft SQL to SAP and pass the benchmark the first time. Passing the benchmark in such a short time with a small team — in total secrecy — was a truly amazing feat. Suddenly, an entirely new world of possibilities had opened up for SAP to fundamentally change the rules of the game for database technology.
Shortly after achieving this milestone, SAP began an academic research project to experiment with the inner workings of in-memory databases with faculty and students at the Hasso Plattner Institute at the University of Potsdam in Germany. The researchers examined the prototypes from the Tracker team — now called NewDB — and added some valuable external perspectives on how to mature the technology for enterprise applications.
However, passing a benchmark and running tests in the labs are far removed from the level of scalability and reliability needed for a database to become the mission-critical heart of a Fortune 50 company. So, for the next four years, SAP embarked on a “bullet-proofing” effort to evolve the “project” into a “product”.
In May 2010, Hasso Plattner, SAP’s supervisory board chairman and chief software advisor, announced SAP’s vision for delivering an entirely in-memory database layer for its application portfolio. If you haven’t seen his keynote speech, it’s worth watching. If you saw it when he delivered it, it’s probably worth watching again. It’s Professor Plattner at his best.
One year later, SAP announced the first live customers on SAP HANA and that SAP HANA was now generally available. SAP also introduced the first SAP applications that were being built natively on top of SAP HANA as an application platform. Not only did these revelations shock the technology world into the “new reality” of in-memory databases, but they initiated a massive shift for both SAP and its partners and customers into the world of “real-time business”.
In November 2011, SAP achieved another milestone when it released SAP Business Warehouse 7.3. SAP had renovated this software so that it could run natively on top of SAP HANA. This development sent shockwaves throughout the data warehousing world because almost every SAP Business Warehouse customer could immediately replace their old, disk-based database with SAP HANA. What made this new architecture especially attractive was the fact that SAP customers did not have to modify their current systems to accommodate it. To make the transition as painless as possible for its customers, SAP designed Business Warehouse 7.3 to be a non-disruptive innovation.
Clay Christensen’s book The Innovator’s Dilemma was very popular reading among the Tracker team during the early days. In addition to all the technical challenges of building a completely new enterprise-scale database from scratch on a completely new hardware architecture, SAP also had to be very thoughtful about how its customers would eventually adopt such a fundamentally different core technology underneath the SAP Business Suite.
To accomplish this difficult balancing act, SAP’s senior executives made the team’s primary objective the development of a disruptive technology innovation that could be introduced into SAP’s customers’ landscapes in a non-disruptive way. They realized that even the most incredible database would be essentially useless if SAP’s customers couldn’t make the business case to adopt it because it was too disruptive to their existing systems. The team spoke, under NDA, with the senior IT leadership of several of SAP’s largest customers to obtain insights concerning the types of concerns they would have about such a monumental technology shift at the bottom of their “stacks.” The customers provided some valuable guidelines for how SAP should engineer and introduce such a disruptive innovation into their mission-critical landscapes. Making that business case involved much more than just the eye-catching “speeds and feeds” from the raw technology. SAP’s customers would switch databases only if the new database was minimally disruptive to implement and extremely low risk to operate. In essence, SAP would have to build a hugely disruptive innovation to the database layer that could be adopted and implemented by its customers in a non-disruptive way at the business application layer.
When viewed from a holistic perspective, the entire “stack” needed to run a Fortune 50 company is maddeningly complex. So, to engineer a new technology architecture for a company, you first have to focus on WHAT the entire system has to do for the business. At its core, the new SAP database architecture was created to help users run their business processes more effectively. It had to enable them to track their inventory more accurately, sell their products more effectively, manufacture their products more efficiently, and purchase materials economically. At the same time, however, it also had to reduce the complexity and costs of managing the landscape for the IT department.
Today, every business process in a company has some amount of “latency” associated with it. For example, one public company might require 10 days to complete its quarterly closing process, while its primary competitor accomplishes this task in 5 days — even though both companies are using the same SAP software to manage the process. Why does it take one company twice as long as its competitor to complete the same process? What factors contribute to that additional “process latency”?
The answers lie in the reality that the software is simply the enabler for the execution of the business process. The people who have to work together to complete the process, both inside and outside the company, often have to do a lot of “waiting” both during and between the various process steps. Some of that waiting is due to human activities, such as lunch breaks or meetings. Much of it, however, occurs because people have to wait while their information systems process the relevant data. The old saying that “time is money” is still completely true, and “latency” is just a nice way of saying “money wasted while waiting.”
As we discussed earlier, having to wait several minutes or several hours or even several days to obtain an answer from your SAP system is a primary contributor to process latency. It also discourages people from using the software frequently or as it was intended. Slow-performing systems force people to take more time to complete their jobs, and they result in less effective use of all the system’s capabilities. Both of these factors introduce latency into process execution.
Clearly, latency is a bad thing. Unfortunately, however, there’s an even darker side to slow systems. When business people can’t use a system to get a quick response to their questions or get their job done when they need to, they invent workarounds to avoid the constraint. The effort and costs spent on “inventing” workarounds to the performance limitations of the system waste a substantial amount of institutional energy and creativeness that ideally should be channeled into business innovation. In addition, workarounds can seriously compromise data quality and integrity.
As we have discussed, the major benefits of in-memory storage are that users no longer have to wait for the system, and the information they need to make more intelligent decisions is instantly available at their fingertips. Thus, companies that employ in-memory systems are operating in “real time.” Significantly, once you remove all of the latency from the systems, users can focus on eliminating the latency in the other areas of the process. It’s like shining a spotlight on all the problem areas of the process, now that the system latency is no longer clouding up business transparency.
In addition to speeding up database I/O throughput and simplifying the enterprise system architecture, SAP also had to innovate in a third direction: business flexibility. Over the years, SAP had become adept at automating “standard” business processes for 24 different industries globally. Despite this progress, however, new processes were springing up too fast to count. Mobile devices, cloud applications, and big data scenarios were creating a whole new set of business possibilities for customers. SAP’s customers needed a huge amount of flexibility to modify, extend, and adapt their core business processes to reflect their rapidly changing business needs. In 2003, SAP released their service-oriented architecture, SAP NetWeaver, and began to renovate the entire portfolio of SAP apps to become extremely flexible and much easier to modify. However, none of that flexibility was going to benefit their customers if the applications and platform that managed those dynamic business processes were chained to a slow, inflexible, and expensive database.
The only way out of this dilemma was for SAP to innovate around the database problem entirely. None of the existing database vendors had any incentive to change the status quo (see The Innovator’s Dilemma for all the reasons why), and SAP couldn’t afford to sit by and watch these problems continue to get worse for their customers. SAP needed to engineer a breakthrough innovation in in-memory databases to build the foundations for a future architecture that was faster, simpler, more flexible, and much cheaper to acquire and operate. It was one of those impossible challenges that engineers and business people secretly love to tackle, and it couldn’t have been more critical to SAP’s future success.
There’s another fundamental law of the technology industry: Faster, Better, Cheaper. That is, each new generation of product or technology has to be faster, better, and cheaper than the generation it is replacing, or customers won’t purchase it. Geoflrey Moore has some great thoughts on how game- changing technologies “cross the chasm.” He maintains, among other things, that faster, better, and cheaper are fundamental characteristics that must be present for a successful product introduction.
In-memory computing fits the faster, better, cheaper model perfectly. I/O is thousands to millions of times faster on RAM than on disks. There’s really no comparison in how rapidly you can get memory of a database in RAM than of a database on disk. In-memory databases are a better architecture due to their simplicity, tighter integration with the apps, hybrid row/column store, and ease of operations. Finally, when you compare the cost of an in-memory database to that of a disk-based database on the appropriate metric — cost per terabyte per second — in-memory is actually cheaper. Also, when you compare the total cost of ownership (TCO) of in-memory databases, they’re even more economical to operate than traditional databases due to the reduction of superfluous layers and unnecessary tasks.
But faster, better, cheaper is even more important than just the raw technology. If you really look at what the switch from an “old” platform to a “new” platform can do for overall usability of the solutions on top of the platform, there are some amazing possibilities.
Take the ubiquitous iPod for example. When Apple introduced the iPod in 2001, it revolutionized the way that people listened to music, even though it wasn’t the first MP3 player on the market. The key innovation was that Apple was able to fit a tiny 1.8-inch hard drive into its small case so you could carry 5gb of music in your pocket, at a time when most other MP3 players could hold only ~64mb of music in flash memory. (This is a classic illustration of “changing the rules of the game.”) I/O speed wasn’t a significant concern for playing MP3s, so the cost per megabyte per second calculation wasn’t terribly relevant. By that measure, 5gb of disk for roughly the same price as 64mb of RAM was a huge difference. It wasn’t significantly faster than its competitors, but it was so phenomenally better and cheaper per megabyte (even at $399) that it became a category killer.
In hindsight, Apple had to make several architectural compromises to squeeze that hard drive into the iPod. First, the hard drive took up most of the case, leaving very little room for anything else. There was a tiny monochrome display, a clunky mechanical “click wheel” user interface, a fairly weak processor, and, most importantly, a disappointingly short battery life. The physics needed to spin a hard disk drained the battery very quickly. Despite these limitations, however, the iPod was still so much better than anything else out there, as it soon took over the market.
Fast-forward six years, and Apple was selling millions of units of its most current version of the “classic” iPod, which contained 160gb of storage, 32 times more than the original 5gb model. Significantly, the new model sold at the same price as the original. In addition to the vastly expanded storage capacity, Apple had added a color screen and a pressure-sensitive “click wheel.” Otherwise, the newer model was similar to the original in most ways.
By this time, however, the storage capacity of the hard drive was no longer such a big deal. Hard drives had become so enormous that nobody had enough music to fill them. In fact, in 2001 people had been thrilled with 5gb of storage, because they could download their entire CD collection onto the iPod. Meanwhile, Moore’s law had been in effect for four full cycles and 16gb of memory cost about the same as a 160gb hard drive. In 2007, Apple could build an iPod with 16gb of solid-state RAM storage — which was only one- tenth of the capacity of the current hard drive model — for the same price as the 2001 model.
It was the shift to solid-state memory as the storage medium for iPods that really changed the game for Apple. Removing the hard drive and its spinning disks had a huge impact on Apple’s design parameters, for several reasons. First, it enabled the company to shrink the thickness and reduce the weight of the iPod, making it easier to carry and store. In addition, it created more room for a bigger motherboard and a larger display. In fact, Apple could now turn the entire front of the device into a display, which it redesigned as a touch-screen interface (hence the name iPod Touch). Inserting a bigger motherboard in turn allowed Apple to insert a larger, more powerful processor in the device. Most importantly, however, eliminating the physical hard drive more than double the battery life since there were no more mechanical disks to spin.
These innovations essentially transformed a simple music player into a miniature computer that you could carry in your pocket. It had an operating system, long battery life, audio and video capabilities, and a sufficient amount of storage. Going even further, Apple could also build another model with nearly all of the same parts that could also make phone calls.
Once a large number of people began to carry a computer around in their pocket, it only made sense that developers would build new applications to exploit the capabilities of the new platform. Although Apple couldn’t have predicted the success of games like “Angry Birds,” they realized that innovation couldn’t be unleashed on their new platform until they removed the single biggest piece of the architecture that was imposing all the constraints. Ironically, it was the same piece of technology that made the original iPod so successful. Think about that for a second: Apple had to eliminate the key technology in the iPod that had made them so successful in order to move to the next level of success with the iPod Touch and the iPhone. Although this might seem like an obvious choice in retrospect, at that time it required a huge leap of faith to take.
In essence, getting rid of the hard drive in the iPods was the most critical technology decision Apple made to deliver the iPod Touch, iPhone, and, eventually, the iPad. Most of the other pieces of technology in the architecture improved as expected over the years. But the real game changer was the switch from disk to memory. That single decision freed Apple to innovate without constraints and allowed them to change the rules of the game again, back to the memory-as-storage paradigm that the portable music player market had started with.
SAP was convinced that SAP HANA represents a similar architectural shift for its application platform. Eliminating the disk-based database that will provide future customers with a faster, better, and cheaper architecture. SAP also believes that this new architecture, like the solid-state memory in the iPod, will encourage the development of a new breed of business applications that are built natively to exploit this new platform.