Apache Mahout Tutorial

This tutorial gives you an overview and talks about the fundamentals of Apache Mahout.

  • Mahout is an open source machine learning library from Apache. The algorithms it implements fall under the broad umbrella of “machine learning,” or “collective intelligence.” This can mean many things, but at the moment for Mahout it means primarily collaborative filtering / recommender engines, clustering, and classification.
  • Also scalable, mahout aims to be the machine learning tool of choice when the data to be processed is very large, perhaps far too large for a single machine. In its current incarnation, these scalable implementations are written in Java, and some portions are built upon Apache’s Hadoop distributed computation project.
  • While Mahout is, in theory, a project open to implementations of all kinds of machine learning techniques, it is in practice a project that focuses on three key areas of machine learning at the moment. These are recommender engines (collaborative filtering), clustering, and classification.
  • Recommender engines are the most immediately recognizable machine learning technique in use today. You will have seen services or sites that attempt to recommend books or movies or articles based on our past actions. They try to infer tastes and preferences and identify unknown items that are of interest.
  • Clustering turns up in less apparent but equally well-known contexts. As its name implies, clustering techniques attempt to group a large number of things together into clusters that share some similarity. It is a way to discover hierarchy and order in a large or hard-to-understand data set, and in that way reveal interesting patterns or make the data set easier to comprehend.
  • Classification techniques decide how much a thing is or isn’t part of some type or category, or, does or doesn’t have some attribute. Classification is likewise ubiquitous, though even more behind-the-scenes. Often these systems “learn” by reviewing many instances of items of the categories in question in order to deduce classification rules.

