1. Large data on the web.
2. Nutch built to crawl this web data.
3. Large volume of data had to saved – HDFS introduced.
4. How to use this data? Report.
5. MapReduce Framework built for coding & running analytics.
6. Unstructured data – Weblogs, click streams, Apache logs.
Server logs – fuse, webDAV, chukwa, flume and scribe.
7. Sqoop and Hiho for loading data into HDFS – RDBMS data.
8. High level interfaces required over low level map reduce programming – Hive, Pig, Jaql.
9. BI Tools with advanced UI Reporting.
10. Workflow tools over Map-Reduce processes and high level languages – Oozie.
11. Monitor & manage haddop, run Jobs/hive, view HDFS – high level view – HUE, karmasphere, eclipse plug in, cacti, ganglia.
12. Support frameworks – Avro (Serialization), Zookeeper (coordination).
13. More high level interfaces/uses – Mahout, Elastic MapReduce.
14. OLTP also possible in HBase.
15. Lucene is a text search engine library written in Java.
Hadoop is best known for MapReduce and it’s distributed file system ( HDFS, renamed from NDFS).
Note:- NDFS is also used for a projects that fall under the umbrella of infrastructure for distributed computing and large scala data processing).
3. Hadoop Streaming
4. Hive and Hue
|Big Data On AWS||Informatica Big Data Integration|
|Bigdata Greenplum DBA||Informatica Big Data Edition|
|Hadoop Testing||Apache Mahout|