Blog

HDFS Architecture, Features & How To Access HDFS - Hadoop

  • (4.0)
  •   |   727 Ratings

Accessing HDFS

•   Applications can read and write HDFS files directly via the JAVA API.
•   Typically ,files are created on a local file system and must be moved into HDFS
•   Like wise, files stored in HDFS may need to be moved to machines local file system.
•   Access to HDFS from the command line is archived with the hadoop fs command.

HDFS Architecture
 
•   HDFS has a master or slave architecture.
•   HDFS cluster consists of single Name Node, a master server that manages the file system name space and regulates access to files by clients.
•   In addition, there are number of data nodes and one per node in the cluster, which manage storage attached to the nodes that they run on

Enthusiastic about exploring the skill set of Hadoop? Then, have a look at the HADOOP TRAINING together additional knowledge. 

•   In HDFS, a file which is stored is split in to one or more blocks and these blocks are stored in a set of data nodes
•   The Name Node executes file system name space operations like opening, closing and renaming files and directories.
•   It also determines the mapping of blocks to data nodes the data nodes also perform block creation, deletion and replication upon instruction from the Name Node

HDFS Features:

1. Hardware Failure
•   Hard ware failure is norm rather than the exception
•   HDFS instance may consist of hundreds or thousands of server machines, each storing part of the file systems data.
•   Files are replicated to handle hard ware failure and also defect failures and recovers from them.

2. Streaming data Access
•   Applications that run on HDFS need streaming access to their data sets
•   HDFS is designed more for batch processing rather than interactive use by users.
•   The emphasis  is  on high through put of data access rather than low latency of data access.

3. Large Data sets
•   Applications that run on HDFS hare large data sets.
•   A typical file in HDFS is gigabytes to tera bytes in size.
•   Thus, HDFS is trend to support large files and it should provide high aggregate data band width and scale to hundreds of nodes in a single cluster and also it should support tens of millions of files in a single instance.

4. Simple coherency model
•   HDFS applications need a write-once-read many access model for files.
•   A file once created, written and closed need to not be changed and this assumption simplifies data coherency issues and enables high through put data access.
•   A map reduce application or a web crawler application fits perfectly with this model and support writes to files in

5. Moving computation is cheaper than moving data
•   A computation requested by an application is much more efficient if it is executed near the data it operates on and it is true when the size of the data set is huge
•   This minimizes net work congestion and increases the over all through put of the system and assumption is that it is often better to migrate the computation closer to where the data is located rather than moving the data to where there application is running.
•   HDFS Provides interfaces for applications to more them selves closer to where the data is located.

6. Portability Across Heterogeneous Hard ware and soft ware plat forms
•   HDFS has been designed to be easily portable from one plat form to another.
•   This facilitates wide spread adoption of HDFS as a platform of choice for a large set of applications.

Frequently asked Hadoop Interview Questions

HDFS Accessibility

1. HDFS can be accessed from applications in many different ways.
2. Natively, HDFS Provides a JAVA API for applications to use and the other is FS Shell  be command interface and browser interface.

Command line Interface to HDFS

•   To get a directory listing the users home directory in HDFS is hadoop fs – ls.
•   To get a directory listing of the HDFS root directory is  hadoop fs – ls./
•   To copy file foo. Txt from local disk to the users directory in HDFS is
 hadoop fs – copy from local foo.txt foo.txt
 This will copy the file ro /user/user name/foo.txt
•   To display the contents of the HDFS file
 Ex:- File path is /user/fred/bor.txt
Command is hadoop fs-cat/usr/fred/bar.txt
•   Move the file to the local disk named as baz.txt is hadoop fs – copy to local /user/fred/bor.txt baz.txt
•   To create a directory called input under the users home directory is hadoop fs-mk dir input
•   To delete the directory and all its contents hadoop fs –rms input
  % hadoop fs – ls /my/files.har
 Found 3 items
 -rw-r—r—10 tom super group 165 2009-04-09
 19:13/ my/files.har/-index
 -rw-r—r—10 tom super group 23 2009-04-09
 19:13/ my/files.har/master index.
  -rw-r—r—1 tom super group 2 2009-04-09
 19:13/ my/files.har/part-0

•   This directory listing shows that a HAR file is made of 2 index files and a collection of part files and the part files conspires of  contents of a number of original files  concatenated together
•   The following command recursively lists the files in the arcHive: %hadoop fs-lsr har:///my/files-har

 Drw-r—r—tom super group 0 2009-04-09 19:13/my/files.har/my
 Drw-r—r—tom super group 0 2009-04-09 19:13/my/files.har/my/ files
 rw-r—r—10 tom super group 0 2009-04-09 19:13/my/files.har/my/ files/a
 Drw-r—r—tom super group 0 2009-04-09 19:13/my/files.har/my/ files/dir
 Drw-r—r—10 tom super group 1 2009-04-09 19:13/my/files.har/my/ files
•   This is quite straight forward if the file system that the HAR file is on is the default file system.
•   If you want to refer to a HAR file on a different file system then you need to use a different form of the path URI to normal.
•   To delete a HAR file, you need to use the recursive form of delte since from the underlying file systems point of view the HAR file is a directory: %hadoop fs- rmr/my/files.har

Limitations:

 1. There are few limitations to be aware of with HAR files.
 2. Creating an archive creates a copy of the original files, so, you need as much disk space as the files you are archiving to create the archive
 3. There is currently no support for archive compression, the files that go into the archive can be compressed.

 

Explore Hadoop Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!

List of Other Big Data Courses:

 Hadoop Adminstartion  MapReduce
 Big Data On AWS  Informatica Big Data Integration
 Bigdata Greenplum DBA  Informatica Big Data Edition
 Hadoop Hive  Impala
 Hadoop Testing  Apache Mahout

 


Popular Courses in 2018

Get Updates on Tech posts, Interview & Certification questions and training schedules