Blog

Hadoop Heartbeat and Data Block Rebalancing

  • (4.0)
  •   |   767 Ratings

HDFS Data storage Reliability

  • The important objective of HDFS is to store data reliably, even when features occur with Name Nodes, data nodes or network partitions
  • Detection is, the first step HDFS takes to overcome failures and HDFS uses heart beat messages to detect connectivity between home and data nodes

Hadoop Heartbeat

  • Several things can cause loss of connectivity between name and data nodes and therefore each data node sends periodic heartbeat messages to its Name Nodes so the latter can detect loss of connectivity if it stops receiving them
  • The Name Node marks as dead data nodes not responding to heart beats and refrains from sending further requests to them
  • Data stored on a data node is no longer available to an HDFS client from that node, which is effectively removed from the system.
  • If the death of a node causes the replication factor of data blocks to drop below their minimum value, the Name Node initiates additional replication to normalized state.
Interested in mastering MapReduce? Enroll now for FREE demo on MapReduce training
The HDFS heartbeat process Diagram
 
 

Data Block Rebalancing:

HDFS data blocks night not always be placed uniformly across data nodes that means the used space for one or more data nodes can be underutilized.
 
HDFS Supports re balancing  data blocks using various models
 
 
  1. One model might more data blocks from one data node to another automatically if the free space on a data node false too low.
  2. Another model might dynamically create additional replicas and rebalance other data blocks in a cluster if a sudden increase in demands for a given file occurs.
  3. HDFS also provides the hadoop balance command for manual rebalancing tasks. The common reason to rebalance is the addition of a new data nodes to a cluster. When placing new blocks, Name Nodes consider various parameters before choosing the data nodes to receive them
 
Some of the considerations are:
  1. Block-replica writing policies
  2. Pretention of data loss due to installation of rack failure
  3. Reduction of cross- installation net work I/o
  4. Uniform data spread across data nodes in a cluster
The cluster- rebalancing feature of HDFS is just one mechanism if uses to sustain the integrity of its data.
Explore MapReduce Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!

List of Other Big Data Courses:

 Hadoop Adminstartion  MapReduce
 Big Data On AWS  Informatica Big Data Integration
 Bigdata Greenplum DBA  Informatica Big Data Edition
 Hadoop Hive  Impala
 Hadoop Testing  Apache Mahout

 


Popular Courses in 2018

Get Updates on Tech posts, Interview & Certification questions and training schedules