Blog

Apache Hadoop Ecosystem

  • (4.0)
  • | 681 Ratings

Hadoop EcoSystem

1. Large data on the web.
2. Nutch built to crawl this web data.
3. Large volume of data had to saved – HDFS introduced.
4. How to use this data? Report.
5. MapReduce Framework built for coding & running analytics.
6. Unstructured data – Weblogs, click streams, Apache logs.
Server logs – fuse, webDAV, chukwa, flume and scribe.
7. Sqoop and Hiho for loading data into HDFS – RDBMS data.
8. High level interfaces required over low level map reduce programming – Hive, Pig, Jaql.
9. BI Tools with advanced UI Reporting.
10. Workflow tools over Map-Reduce processes and high level languages – Oozie.
11. Monitor & manage haddop, run Jobs/hive, view HDFS – high level view – HUE, karmasphere, eclipse plug in, cacti, ganglia.
12. Support frameworks – Avro (Serialization), Zookeeper (coordination).
13. More high level interfaces/uses – Mahout, Elastic MapReduce.
14. OLTP also possible in HBase.
15. Lucene is a text search engine library written in Java.

Interested in mastering Hadoop Course? Enroll now for FREE demo on Hadoop Training.

Different EcoSystems of Hadoop

Hadoop is best known for MapReduce and it’s distributed file system ( HDFS, renamed from NDFS).

Note:- NDFS is also used for a projects that fall under the umbrella of infrastructure for distributed computing and large scala data processing).

1. HDFS
2. MapReduce
3. Hadoop Streaming
4. Hive and Hue
5. Pig
6. Sqoop
7. Oozie
8. HBase
9. Flume
10. Mahout
11. Fuse
12. Zookeeper

List of Other Big Data Courses:

 Hadoop Adminstartion  MapReduce
 Big Data On AWS  Informatica Big Data Integration
 Bigdata Greenplum DBA  Informatica Big Data Edition
 Hadoop Hive  Impala
 Hadoop Testing  Apache Mahout