Since talking about big data in the abstract can’t provide a clear vision of its benefits, this section ofers concrete examples related to BI, data science, and real-time insights.
SAP Business Intelligence delivers insight into every aspect of your organization by distributing information on premise.
Big data technology is needed not only for the many new types of data, but for large scale data warehouses. Case in point: the largest data warehouse in the world, as attested by the guinness world records, holds 12.1 petabytes of data.
To explore another example, consider ARI, the largest fleet management services company in the world. In conjunction with its partners, ARI accounts for more than 2 million vehicles worldwide.
Maintenance management for the entire lifecycle of a single vehicle can involve more than 14,000 data points, including everything from information on minor repairs to regular preventive maintenance information and manufacturer updates and recalls.
ARI’s data warehouse was straining under the load of the data. Its in-house ETL solution could not keep up with the growth in data, and analysis was taking far too long. After a proof-of-concept, ARI migrated its data warehouse to SAP HANA.
ARI is able to perform deeper data analysis in less than four seconds (previously a manual process that took over 24 hours). The company also increased efficiency in call centers and improved first-time call resolution, resulting in higher customer satisfaction.
As ARI’s Director of Information Management, Bill Powell explained, “There was a sea of information coming in and it could take up to two days to pull together, which affected our service levels. In-memory HANA means we can answer questions in seconds.” ARI’s Keith Allen added, “Our goal has been to drive greater efficiencies. The business can ask questions [of the data] and get responses directly. Customers can also build their own dashboards. It is self-service BI.”
Many companies are considering moving their data warehouses and BI initiatives to SAP HANA and SAP IQ to speed up their analytic capabilities and drive faster value from their data.
Healthcare is one of the most exciting big data stories there is, as seemingly intractable problems are beginning to find solutions through genomic analysis.
Invasive cancer is the second leading cause of death in the US. In 2008, approximately 7.6 million people died of cancer worldwide. The path to more successful cancer treatment lies in human DNA. More and more physicians are not only searching for changes in human tissue that signal cancer, but are increasingly interested in the alterations of the human genome itself.
Mitsui Knowledge Industries (MKI) is working on SAP HANA. They begin by pre-processing DNA sequences from normal cells and comparing them with cancer cells. Processing is done against large volumes of data. This pre-processing is run against data in Hadoop clusters and can take anywhere from several days to a week.
Next, they move relevant data into SAP HANA, where they perform complex analytical processes to identify variants from the pre-processed sequences.
They also analyze what medicines might work against the mutated genes.
With SAP HANA, they take advantage of built-in predictive algorithm libraries (PAL) and integration with the open source R statistical tool to create predictive models to assess best treatment options for the patient.
Initially, MKI was using only Hadoop and R for analysis, but decided to add SAP HANA to reduce processing time so that they could deliver personalized results more quickly. It is the MKI’s goal — to provide personalized treatments to patients as quickly as possible.
MKI still uses Hadoop to pre-process large volumes of DNA (normal and cancerous) so that they have a strong foundation of existing sequences. But they now use SAP HANA to analyze a particular patient’s DNA against related sequences from Hadoop to better predict the best medicines and treatment for the patient.
Hadoop is used to align the patient’s DNA sequence with the normal sequence, because the data is in a semi-structured format, can be parallelized across multiple machines. Also, the MKI team is able to use an open source package for aligning genomes.
Identifying the mutations and predicting the best treatment requires a lot of highly iterative analysis. This is ideally done in SAP HANA. As a result, MKI has been able to accelerate the overall time from 2 to 3 days to 20 minutes. Furthermore, MKI believes that they can get it under 10 minutes when they deploy a 64 node Hadoop cluster and a 40-core HANA machine.
Arguably, big data can be more important where you need to analyze real-time data as it streams.
Businesses can use big data to gain a 360- degree view of the customer by combining enterprise data with customer sentiment gleaned from social networks, customer service interactions, and web click-stream data. Service providers can proactively reach out to customers and keep them satisfied, loyal, and coming back for more.
A big data platform should meet the needs of all your stakeholders, from BI and analytic professionals to data scientists, to IT staff who help bring actionable insights to executive leadership, middle managers, and frontline workers, sometimes by even embedding those insights directly into business processes.
It is helpful to classify the different users into essentially three categories shown in Figure 2.
Figure 2. An IT landscape for big data, broken down by role
The business analyst provides the organization with precise, repeatable, accurate reporting on the data stored within the organization. They are supporting business operational decisions; it’s all about the statements of fact, answered instantly. The business analyst is focused on reporting on the one and only truth. Normally, the information analyzed comes from data generated in transactional systems, data that is highly structured. There is an emphasis on data quality, so that standardized reports can be executed with confidence that all of the numbers will line up. The information views of the data have well understood meanings and there is a focus on unambiguous determinations.
The Business Analyst is in essence looking for an enterprise data warehouse and the rigor that entails. However, current data warehouses don’t handle large datasets extremely well. As a result, many data warehouses contain summary data with much of the detailed information thrown away or stored in a highly complex BI landscape with many data marts, data caches, and generally many layers of technology. With SAP HANA, users can store all of the data without causing query response times to grind to a halt. This also makes it possible to remove the many data marts and data caches to originally put in place to compensate for poor performance, greatly simplifying the data warehouse.
In some cases, Business Analysts may benefit from access to Hadoop environments. If the data in Hadoop needs to be reported on, you may want to bring it into your enterprise data warehouse. Otherwise, it should be carefully structured and stored in Hadoop. Here’s the key. The BI analyst uses GUI-based tools to access information and to generate reports. This requires data to be organized and structured in order to make it easily accessible and generated using forms. Projects like Hive and Pig help to do that for your Hadoop environment.
SAP BusinessObjects BI can now access Hadoop environments through Hive. With Hive, you define table structures to data stored in Hadoop. BI analysts can use the SAP BusinessObjects BI tools to create reports, dashboards, and explore data all inside Hadoop. BUSINESSOBJECTS translate the users’ actions into HiveQL commands, a language modeled after SQL. What is particularly powerful with BusinessObjects is that if the BI administrator has created the right universes, or access layers, the BI analyst can query data across various systems. In other words, you can take data from Hadoop and combine it with data from other data sources.
SAP IQ is an important platform to support the BI analyst. IQ is a disk- based bulk data store optimized for analytics. It can be used along with or even in place of Hadoop.
Of course, a critical step while providing the BI analyst access to Hadoop is to define what tables, columns, and so on are accessible and the relationships between them. SAP BusinessObjects BI gives the administrator the tools needed to do just that, including for a Hive implementation.
Here’s the key: BI analysts need to be carefully controlled, structured access to Hadoop environments from their GUI tools.
The data scientist is a role that is increasingly valuable to technology businesses and is instrumental in using predictive analytics to impact the bottom line.
In contrast, data scientists work at the other end of the information certainty spectrum. They deal with the uncertainty inherent in any large, complex organization and seek to draw conclusions that are statistically relevant but not completely certain. One example is predictive analytics, where large amounts of data are fed into models in order to predict what the future may hold. The data scientist may create custom systems to explore and probe the corporate data store and must be equipped with tools that interpret unstructured data and make sense of it for the organization and the problem domain.
The data scientist therefore requires as much flexibility as possible. The business analyst is skilled at using BI tools and understanding how the data applies to the business while the data scientist usually has very technical skills. Data scientists typically decide which tool to use based on the data that offers the most promise. They may choose a data mining technique, or techniques, and then select the tools that support the technique, such as the R statistical language, which is supported in SAP HANA.
While the BI analyst needs structure in a controlled environment, the data scientist wants a lot of freedom and flexibility. Depending on the analysis performed, they want to be able to run their algorithms in Hadoop using MapReduce algorithms, in-memory, or in the database using in-database analytic algorithms.
Operational users are involved in the day-to-day operation of core business processes. They are the frontline workers such as call center operators, marketing campaign managers, warehouse personnel, and sales representatives. Operational users can benefit from information that helps them make decisions in the moment, often based on insights uncovered by business analysts and data scientists. This information is often delivered in the form of dashboards, daily reports, or even predictive models embedded in enterprise applications. The challenge of real-time analysis is to feed automated insights back into the decision loop fast enough to guide the action of the human or the machine in making crucial decisions.
Operational users typically are not technical nor do they have experience in using analytic and reporting tools. In essence, the solution requires the development of user interfaces suited to how operational users need to consume the information.
|Big Data On AWS||Informatica Big Data Integration|
|Bigdata Greenplum DBA||Informatica Big Data Edition|
|Hadoop Testing||Apache Mahout|