Before exploring the capabilities of Apache Spark and also analyzing the use cases where it finds its perfect usage, we need to spend quality time in learning what is Apache Spark about? Apache Spark has originated as one of the biggest and the strongest big data technologies in a short span of time. As it is an open source substitute to MapReduce associated to build and run fast as secure apps on Hadoop. Spark comes with a library of machine learning and graph algorithms, and real-time streaming and SQL app, through Spark Streaming and Shark, respectively. With these details at hand, let us take some time in understanding the most common use cases of Apache Spark, split by industry types for our better understanding.
Each and every innovation in the technology space that hits the current requirements of Organizations, should be good enough for testing them on use cases from the marketplace. There should always be rigorous analysis and a proper approach on the new products that hits the market, that too at the right time with fewer alternatives. Thinking about this, you might have the following questions dwelling round your mind:
* Looking at Apache Spark, you might understand the very reason why is it deployed.
* You would also wonder where it will stand in the crowded marketplace.
* How would it fare in this competitive world when there are alternatives giving up a tight competition for replacements?
All these questions will be answered in a little while going through the chief deployment modules that will definitely prove uses of Apache Spark being handled pretty well by the product. Let us take a look at some of the industry specific Apache Spark use cases that has demonstrated abilities to build and run fast big data applications:
Banks have started with the Hadoop alternatives as like Spark to access and also to analyze social media profiles, call recordings, complaint logs, emails and the like to provide better customer experience and also to excel in the field that they want to grow. This will also enable them to take right business decisions to take appropriate Credit risk assessment, targeted advertising and Customer segmentation.
One of the best examples is to cross-check on your payments, if they are happening at an alarming rate and also from various other geographical locations which could be practically impossible for a single individual to perform as per the time barriers – such fraudulent cases can be easily identified using technologies as like Apache Spark. This is just the beginning of the wonders that Apache Spark can create provided the necessary access to the data is made available to it. Banks have also put to use the business models to identify fraudulent transactions and have deployed them in batch environments to identify and arrest such transactions.
* Most of the banks have already invested heavily in using Apache Spark to provide them a unified view of an individual or an Organization, to target their business products based on the usage and also based on their requirements. Apache Spark in conjunction with Machine learning, can analyze the business spends of an individual and predict the necessary suggestions that a Bank must do to bring the customer into newer avenues of their products through Marketing department.
* Banking firms use analytic results to identify patterns around what is happening, and also can make necessary decisions on how much to invest and where to invest and also identify how strong is the competition in a certain area of business.
Information related to the real time transactions can further be passed to Streaming clustering algorithms like Alternating Least Squares or K-means clustering algorithms. The results then observed can also be combined with the data from other avenues like Social media, Forums and etc. to make necessary recommendations to the Consumers based on the latest trends.
* Apache Spark at Alibaba:
The world’s leading e-commerce giant, Alibaba executes sets of huge Apache Spark jobs to analyze the data in the ranges of Peta bytes (that is generated on their own e-commerce platforms). Out of the millions of users who interact with the e-commerce platform, each of these interactions are further represented as complicated graphs and processing is then done by some sophisticated Machine learning jobs on this data using Apache Spark
* Apache Spark at eBay:
One other giant in this industry, who has ruled this industry for long periods is eBay. eBay uses Apache Spark to provide offers to targeted customers based on their earlier experiences and also tries to leave no stone unturned in enhancing the customer experience with them. This not only enhances the customer experience in providing what they might require in a proactive manner, also helps them to efficiently and smoothly handle customer’s time on the e-commerce site. eBay does this magic letting Apache Spark leverage through Hadoop YARN.
Healthcare industry is the newest in imbibing more and more use cases with the advanced of technologies to provide world class facilities to their patients. Apache Spark is gaining the attention in being the heartbeat in most of the Healthcare applications. Hospitals have turned towards Apache Spark to analyze patients past medical history to identify possible health issues based on their medical history. Let us take a look at the possible use cases that we can scan through the following:
* Apache Spark at MyFitnessPal
One of the largest health and fitness portal named MyFitnessPal provides their services in helping people achieve and attain a healthy lifestyle through proper diet and exercise. The portal makes use of the data provided by the users in an attempt to identify high quality food items and passing these details to Apache Spark for the best suggestions. The use case where Apache Spark was put to use was able to scan through food calorie details of 80+ million users.
* Apache Spark at PSL
Many software vendors have taken up to this cause of analyzing patient past medical history to provide better suggestions, food habits, and applicable medications to avoid any future medical situations that they might face. Patients with history of Sugar, Cardiovascular issues, Cervical Cancer and etc. have taken advantage of such services and identified cases earlier to treat them properly.
Apache Spark has created a huge wave of good vibes in the gaming industry to identify patterns from real time user and events, to harvest on lucrative opportunities as like auto adjustments on gaming levels, targeted marketing, and player retention in final and so on. Most of the Video sharing services have put Apache Spark to use along with NoSQL databases such as MongoDB to showcase relevant advertisements for their users based on the videos that they watch, share and on activities based on their usage.
* Apache Spark at Yahoo:
Apache Spark has found a new customer in the form of Yahoo to personalize their web content for targeted advertising. Machine learning algorithms are put to use in conjunction with Apache Spark to identify on the topics of news that users are interested in going through, just like the trending news articles based on the users accessing Yahoo News services. Earlier Machine Learning algorithms for news personalization would have required around 20000 lines of C / C++ code but now with the advent of Apache Spark and Scala, algorithms have been cut down to bare minimum of around 150 lines of programming code.
* Apache Spark at Conviva:
One of the leading Video streaming company names Conviva, has put Apache Spark to use to delivery service at the best possible quality to their customers. This has been achieved by eliminating screen buffering and also in learning with great detail on what content to be shown when to who at what time to make it beneficial. All of this has been imbibed into their Video player to manage the live video traffic coming from around 4Billion video feeds every single month.
* Apache Spark at Netflix:
One other name that is even more popular in the similar grounds, Netflix. Netflix has put Apache Spark to process real time streams to provide better online recommendations to the customers based on their viewing history. Streaming devices at Netflix leverage upon the event data that is being captured and then leverage upon the Apache Spark Machine Learning capabilities to provide very efficient recommendations to their customers. Netflix is known to process at least 450 billion events a day that flow to server side applications directed to Apache Kafka.
* Apache Spark at Pinterest:
Pinterest, another interesting brand name which has put to use Apache Spark to discover the happening trends in user engagement details. This has been done to react to the developing latest trends in the real time by performing an in-depth analysis of user behaviors on their website.
* Apache Spark at TripAdvisor
TripAdvisor, mammoth of an Organization in the Travel industry helps users to plan their perfect trips (let it official, or personal) using the capabilities of Apache Spark has speeded up on customer recommendations. It helps users with recommendations on prices querying thousands of providers for rates on a specific route and helps users in identifying the best service that they would want to avail at the best price available from the plethora of service providers. Analyzing and processing the reviews on hotels in a readable format has been achieved by using Apache Spark for TripAdvisor.
Apache Spark finds its usage in many of the big names as we speak, some of those Organizations include Uber, Pinterest and etc. These Organizations extract, gather TB’s of event data from their day to day usage from the Users and engage real time interactions with such created data. Doing so, they deduce the much required data using which they constantly maintain smooth and high quality customer experience.
Get Updates on Tech posts, Interview & Certification questions and training schedules