Blog

Hadoop Heartbeat and Data Block Rebalancing

  • (4.0)
  • | 2363 Ratings

HDFS Data storage Reliability

  • The important objective of HDFS is to store data reliably, even when features occur with Name Nodes, data nodes or network partitions
  • Detection is, the first step HDFS takes to overcome failures and HDFS uses heart beat messages to detect connectivity between home and data nodes

Hadoop Heartbeat

  • Several things can cause loss of connectivity between name and data nodes and therefore each data node sends periodic heartbeat messages to its Name Nodes so the latter can detect loss of connectivity if it stops receiving them
  • The Name Node marks as dead data nodes not responding to heart beats and refrains from sending further requests to them
  • Data stored on a data node is no longer available to an HDFS client from that node, which is effectively removed from the system.
  • If the death of a node causes the replication factor of data blocks to drop below their minimum value, the Name Node initiates additional replication to normalized state.
Interested in mastering MapReduce? Enroll now for FREE demo on MapReduce training
The HDFS heartbeat process Diagram
 
HDFS heartbeat process
 

Data Block Rebalancing:

HDFS data blocks night not always be placed uniformly across data nodes that means the used space for one or more data nodes can be underutilized.
 
HDFS Supports re balancing  data blocks using various models
 
 
  1. One model might more data blocks from one data node to another automatically if the free space on a data node false too low.
  2. Another model might dynamically create additional replicas and rebalance other data blocks in a cluster if a sudden increase in demands for a given file occurs.
  3. HDFS also provides the hadoop balance command for manual rebalancing tasks. The common reason to rebalance is the addition of a new data nodes to a cluster. When placing new blocks, Name Nodes consider various parameters before choosing the data nodes to receive them
 
Some of the considerations are:
  1. Block-replica writing policies
  2. Pretention of data loss due to installation of rack failure
  3. Reduction of cross- installation net work I/o
  4. Uniform data spread across data nodes in a cluster
The cluster- rebalancing feature of HDFS is just one mechanism if uses to sustain the integrity of its data.
Explore MapReduce Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!

List of Big Data Courses:

Subscribe For Free Demo

Free Demo for Corporate & Online Trainings.

Ravindra Savaram
About The Author

Ravindra Savaram is a Content Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.


DMCA.com Protection Status

Close
Close