Home  >  Blog  >   Hadoop  > 

Hadoop Heartbeat and Data Block Rebalancing

Rating: 4
  1. Share:

HDFS Data storage Reliability

  • The important objective of HDFS is to store data reliably, even when features occur with Name Nodes, data nodes or network partitions
  • Detection is, the first step HDFS takes to overcome failures and HDFS uses heartbeat messages to detect connectivity between home and data nodes

Hadoop Heartbeat

  • Several things can cause loss of connectivity between name and data nodes and therefore each data node sends periodic heartbeat messages to its Name Nodes so the latter can detect loss of connectivity if it stops receiving them
  • The Name Node marks as dead data nodes not responding to heart beats and refrains from sending further requests to them
  • Data stored on a data node is no longer available to an HDFS client from that node, which is effectively removed from the system.
  • If the death of a node causes the replication factor of data blocks to drop below their minimum value, the Name Node initiates additional replication to a normalized state.

Interested in mastering MapReduce? Enroll now for a FREE demo on MapReduce training

The HDFS heartbeat process Diagram

HDFS heartbeat process

Frequently Asked MapReduce Interview Questions & Answers

Data Block Rebalancing:

HDFS data blocks night not always be placed uniformly across data nodes that means the used space for one or more data nodes can be underutilized.

HDFS Supports rebalancing  data blocks using various models

  • Frequently Asked MapReduce Interview Questions & Answers
  • One model might more data blocks from one data node to another automatically if the free space on a data node false too low.
  • Another model might dynamically create additional replicas and rebalance other data blocks in a cluster if a sudden increase in demands for a given file occurs.
  • HDFS also provides the Hadoop balance command for manual rebalancing tasks. The common reason to rebalance is the addition of a new data nodes to a cluster. When placing new blocks, Name Nodes consider various parameters before choosing the data nodes to receive them

Some of the considerations are:

  1. Block-replica writing policies
  2. Pretention of data loss due to the installation of rack failure
  3. Reduction of cross- installation net work I/o
  4. Uniform data spread across data nodes in a cluster

The cluster- rebalancing feature of HDFS is just one mechanism it uses to sustain the integrity of its data.

Explore MapReduce Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!

List of Big Data Courses:


Join our newsletter

Stay updated with our newsletter, packed with Tutorials, Interview Questions, How-to's, Tips & Tricks, Latest Trends & Updates, and more ➤ Straight to your inbox!

Course Schedule
Hadoop TrainingMar 25 to Apr 09
Hadoop TrainingMar 28 to Apr 12
Hadoop TrainingApr 01 to Apr 16
Hadoop TrainingApr 04 to Apr 19
Last updated: 24 March 2023
About Author
Remy Sharp
Ruchitha Geebu

I am Ruchitha, working as a content writer for MindMajix technologies. My writings focus on the latest technical software, tutorials, and innovations. I am also into research about AI and Neuromarketing. I am a media post-graduate from BCU – Birmingham, UK. Before, my writings focused on business articles on digital marketing and social media. You can connect with me on LinkedIn.

Recommended Courses

1 /15