HDFS Data storage Reliability
- The important objective of HDFS is to store data reliably, even when features occur with Name Nodes, data nodes or network partitions
- Detection is, the first step HDFS takes to overcome failures and HDFS uses heartbeat messages to detect connectivity between home and data nodes
- Several things can cause loss of connectivity between name and data nodes and therefore each data node sends periodic heartbeat messages to its Name Nodes so the latter can detect loss of connectivity if it stops receiving them
- The Name Node marks as dead data nodes not responding to heart beats and refrains from sending further requests to them
- Data stored on a data node is no longer available to an HDFS client from that node, which is effectively removed from the system.
- If the death of a node causes the replication factor of data blocks to drop below their minimum value, the Name Node initiates additional replication to a normalized state.
Interested in mastering MapReduce? Enroll now for a FREE demo on MapReduce training
The HDFS heartbeat process Diagram
Frequently Asked MapReduce Interview Questions & Answers
Data Block Rebalancing:
HDFS data blocks night not always be placed uniformly across data nodes that means the used space for one or more data nodes can be underutilized.
HDFS Supports rebalancing data blocks using various models
- Frequently Asked MapReduce Interview Questions & Answers
- One model might more data blocks from one data node to another automatically if the free space on a data node false too low.
- Another model might dynamically create additional replicas and rebalance other data blocks in a cluster if a sudden increase in demands for a given file occurs.
- HDFS also provides the Hadoop balance command for manual rebalancing tasks. The common reason to rebalance is the addition of a new data nodes to a cluster. When placing new blocks, Name Nodes consider various parameters before choosing the data nodes to receive them
Some of the considerations are:
- Block-replica writing policies
- Pretention of data loss due to the installation of rack failure
- Reduction of cross- installation net work I/o
- Uniform data spread across data nodes in a cluster
The cluster- rebalancing feature of HDFS is just one mechanism it uses to sustain the integrity of its data.
Explore MapReduce Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!
List of Big Data Courses: