Blog

HDInsight Of Azure

  • (5.0)
  •   |   1128 Ratings

Big data described as bulk information. Hadoop is an open source, Java-based programming framework that supports the processing and storage of Big Data.  A computer cluster is a set of connected computers that can work together as single system.  Hadoop Clusters are such type of computer clusters that can store, analyse big data which are structured and unstructured. Azure HDInsight deploys these Azure Hadoop clusters in the cloud using the Hortonworks Data Platform (HDP) Hadoop distribution. It also consists of Apache HBase which is a tabular NoSQL database that provides a real-time access to data in HDFS. Apache Storm is a stream analytics platform for processing real-time events like sensors. 

Features of Azure HDInsights

  • It is mainly used to create, manage and analyse data report statistics on big data. 
  • With the help of virtual machines, you can quickly deploy the system from the Azure portal.
  • You can implement any number of nodes in a cluster.
  • Pay for the service you used.
  • Re use the cluster to another when a specific job is completed or you can stop using it.
  • The HDInsight service can work with non- microsoft vendors.
  • It is cost-effective to collect and store structured or unstructured data.
  • Extracting undiscovered information from big quantities of unstructured data is easy.
  • Hadoop cluster can be build with-in minutes.
  • The RESTful API to perform create, read, update, delete (CRUD) operations on text or binary data, like video, audio and images given by client.
  • The flat network storage system technology offers high speed connection between nodes and blob storage system.
  • The master -slave pattern of Insight allows central node or master node to operate and control the cluster centrally. The secondary nodes are integrated to Azure deployments.

The Insights provided by Azure HD are:

  • Disk Usage
  • Utilization of CPU
  • Cluster Load
  • Memory Used
  • Network Used
  • AJAX Calls of a website like no.of views, no.of clicks on particular event etc.

Azure HDinsights Storage Services

It is a general purpose storage system connected to computer nodes. By storing the data in Azure Storage one has the benefits of data sharing, data achieving, geo replication and elastic scaling capabilities.  These enables data recovery and redundancy.  The scale-out file system automatically scaled depending upon number of nodes connected to the cluster.  Every time when a cluster is generated, there is no need to reload the data. Even after the original HDInsight cluster is deleted, you can still use the default storage container.

Limitation to Storage Services

  • The Insight Storage service account located in a different location other than the HDInsight cluster location is not supported.
  • Blob storage accounts are not supported.
  • Sharing the default Blob container with multiple HDInsight clusters might corrupt job history kind of cluster-specific information stored in Blob Container.

Live Scenario on Azure HDInsights

Let us consider a healthcare monitoring development and operational cycle.

The above is a health care monitoring process that happens in any hospital. Using Azure HDInsights , you can have a time-to-time monitoring on each process, status of servers and finally, depicts faults and errors if occurred. Azure Insight is deployed in the healthcare product. 

After registering the hospital application in the Azure portal and when you start running it, you get the overall performance of healthcare application as given in the below figure.

Fig: Overall Application Performance Metrics

It shows Browser metrics like page views, page load time, request on each page, each session etc 

The failures and errors occured while performing a task in the  application like server exceptions, page faults, data dependency failures etc 

Fig: Failures Metrics


Popular Courses in 2018

Get Updates on Tech posts, Interview & Certification questions and training schedules