MongoDB had introduced one of its vital features with the version v2.2, Aggregation. Aggregation introduced by MongoDB has replaced one of its earlier feature of Map/Reduce of MongoDB which was used until v2.2. In simple words, MongoDB Aggregation has replaced the MongoDB Map/Reduce feature from v2.2.
Aggregation in its simplest sense is to perform operations on documents and compute the result out it. Aggregations are a set of functions that enable us to manipulate data that is being returned from queries on MongoDB. Aggregation operations group sets of values from a bunch of documents together, on which there can be operations further performed to return a single result. There are 3 possible ways to perform aggregation in MongoDB as in version v3.4, and they are as follows:
Now with the background on what aggregation is and also classifying each of the ways how we are going to aggregate, let us discuss about each method and also see an example how this can be achieved. Firstly we will set up some test data for our ready reference so that the each method of aggregation when applied on it, we will be able to understand the results better.
Let us consider the following Collection as our base and fill it with some test data so that we can run our aggregation queries on it.
Providing the queries for the sake of practice and easy access:
MongoDB has implemented or modeled its aggregation framework as data processing pipelines, the documents of a collection enter into a multi-stage pipeline system that transforms the documents and thus generates an aggregated result. The most common pipeline stages that provide filters operate on the documents like queries and modified such that the document transformation forms the output document. Pipeline operations do provide tools and options for grouping / sorting of documents by specified fields of a collection. Pipeline stages can be implemented to do interim calculations like average, sum or even concatenation of strings before passing to the next levels of pipeline.
Pipelines provides very efficient ways of aggregation of data via the native operations as provided by MongoDB database server, and is also the most preferred method for aggregating data on MongoDB. Data aggregation can be performed on shared collection as well. Aggregation pipelines can use indexes to better their performance during the interim stages, as accordingly.
Let us check out a simple example of applying data aggregation via the aggregation pipelines:
MongoDB provides ways and means to aggregate data using the map-reduce methods. The Map/Reduce method consists of two phases, the map phase that processes each of the documents of a collection and emits one or more objects for each input document and secondly the reduce phase that amalgamates the output received from the map phase. As an optional phase, there can be finalize stage which can make final modifications on the aggregation obtained from the first two phases, namely map and reduce phases.
Map/Reduce uses customized java script functions to perform the data aggregation via the map and reduce operations. The custom java script provides huge amounts of flexibility compared to the aggregation provided by pipelines, but then the map/reduce way of aggregation is complex and also is less efficient solution. Map/Reduce can perform data aggregation on a shared collection and also at the same time can route the map/reduce output to a shared collection.
Now let us try to understand the same concept of data aggregation from another point of view, the third method that is made available by MongoDB as such. Single Purpose aggregation operations are provided to check the count() and the distinct() values. These operations are carried out on a single collection object and hence the data aggregation that will be done will also be on one single collection object. These aggregation operations provide very simple uses for the end user, and these operations can never match the might of Aggregation Pipeline or Map/Reduce operations.
Let us see how these can be implemented on the data set that we already have with us for this article.
In this article, we have tried to understand the process of data aggregation that has been provided by MongoDB as such. We have also tried to understand the three variants of data aggregation provided by MongoDB. Understood also the pros and cons that each of these data aggregation operations brings with them as a feature.
One point to note is that the Aggregation Pipelines are the best possible way for one to perform data aggregation and yield the resultant output as expected.
Get Updates on Tech posts, Interview & Certification questions and training schedules