MongoDB had introduced one of its vital features with the version v2.2, Aggregation. Aggregation introduced by MongoDB has replaced one of its earlier features of Map/Reduce of MongoDB which was used until v2.2. In simple words, MongoDB Aggregation has replaced the MongoDB Map/Reduce feature from v2.2.
Aggregation in its simplest sense is to perform operations on documents and compute the result out it. Aggregations are a set of functions that enable us to manipulate data that is being returned from queries on MongoDB.
Aggregation operations group sets of values from a bunch of documents together, on which there can be operations further performed to return a single result. There are 3 possible ways to perform aggregation in MongoDB as in version v3.4, and they are as follows:
If you want to enrich your career and become a professional in MongoDB, then visit Mindmajix - a global online training platform: "MongoDB Training" This course will help you to achieve excellence in this domain.
Related blog: MongoDB for Beginners
Checkout MongoDB Interview Questions
Now with the background on what aggregation is and also classifying each of the ways how we are going to aggregate, let us discuss each method and also see an example of how this can be achieved.
Firstly we will set up some test data for our ready reference so that each method of aggregation when applied to it, will be able to understand the results better.
Let us consider the following Collection as our base and fill it with some test data so that we can run our aggregation queries on it.
Providing the queries for the sake of practice and easy access:
use SampleAggregationDB
db.createCollection(“AggregationExampleCollection”)
db.AggregationCollection.insertMany([
{ _id: ObjectId('012345678912'), title: 'DragonStone', description: 'GOT Season 7 Episode 1', directed_by: 'Matt Shakman', tags: ['drogon', 'danerys'], likes: 100 },
{ _id: ObjectId('012345678913'), title: 'Stormborn', description: 'GOT Season 7 Episode 2', directed_by: 'Matt Shakman', tags: ['jon', 'sansa'], likes: 10 },
{ _id: ObjectId('012345678914'), title: 'The Queens Justice', description: 'GOT Season 7 Episode 3', directed_by: 'Matt Shakman', tags: ['cersei', 'danerys'], likes: 750 },
{ _id: ObjectId('012345678915'), title: 'The Spoils of War', description: 'GOT Season 7 Episode 4', directed_by: 'Matt Shakman', tags: ['jaime', 'cersei'], likes: 10000 },
{ _id: ObjectId('012345678916'), title: 'EastWatch', description: 'GOT Season 7 Episode 5', directed_by: 'Matt Shakman', tags: ['jon', 'danerys'], likes: 1250 }
])
MongoDB has implemented or modeled its aggregation framework as data processing pipelines, the documents of a collection enter into a multi-stage pipeline system that transforms the documents and thus generates an aggregated result. The most common pipeline stages that provide filters operate on the documents like queries and modified such that the document transformation forms the output document.
Pipeline operations do provide tools and options for grouping/sorting of documents by specified fields of a collection. Pipeline stages can be implemented to do interim calculations like average, sum, or even concatenation of strings before passing to the next levels of pipeline.
Pipelines provide very efficient ways of aggregation of data via the native operations as provided by the MongoDB database server and are also the most preferred method for aggregating data on MongoDB.
Data aggregation can be performed on a shared collection as well. Aggregation pipelines can use indexes to better their performance during the interim stages, as accordingly.
Let us check out a simple example of applying data aggregation via the aggregation pipelines:
db.AggregationCollection.aggregate([
{$match : { directed_by : "Matt Shakman"} },
{$group : {_id : "$title", noOfEpisodes : {$sum : 1}}}
])
The result is:
/* 1 */
{
"_id" : "The Spoils of War",
"noOfEpisodes" : 1.0
},
/* 2 */
{
"_id" : "The Queens Justice",
"noOfEpisodes" : 1.0
},
/* 3 */
{
"_id" : "EastWatch",
"noOfEpisodes" : 1.0
},
/* 4 */
{
"_id" : "Stormborn",
"noOfEpisodes" : 1.0
},
/* 5 */
{
"_id" : "DragonStone",
"noOfEpisodes" : 1.0
}
MongoDB provides ways and means to aggregate data using the map-reduce methods. The Map/Reduce method consists of two phases, the map phase that processes each of the documents of a collection and emits one or more objects for each input document, and secondly the reduces phase that amalgamates the output received from the map phase.
As an optional phase, there can be finalized stage which can make final modifications on the aggregation obtained from the first two phases, namely map and reduce phases.
Map/Reduce uses customized javascript functions to perform the data aggregation via the map and reduce operations. The custom javascript provides huge amounts of flexibility compared to the aggregation provided by pipelines, but then the map/reduce the way of aggregation is complex and also is a less efficient solution.
Map/Reduce can perform data aggregation on a shared collection and also at the same time can route the map/reduce output to a shared collection.
db.AggregationCollection.mapReduce(
function() {emit(this.title, this.description);},
function(key, values) {return Array.sum(values)},
{
query : {directed_by : "Matt Shakman"},
out : "likesCount"
}
)
Now let us try to understand the same concept of data aggregation from another point of view, the third method that is made available by MongoDB as such. Single Purpose aggregation operations are provided to check the count() and the distinct() values.
These operations are carried out on a single collection object and hence the data aggregation that will be done will also be on one single collection object. These aggregation operations provide very simple uses for the end-user, and these operations can never match the might of the Aggregation Pipeline or Map/Reduce operations.
Let us see how these can be implemented on the data set that we already have with us for this article.
db.AggregationCollection.count() returns 5 records that exist on the Collection object
db.AggregationCollection.distinct(“directed_by”) results in ‘Matt Shakman’ as the output.
Related blog: Creation of MongoDB Collection
In this article, we have tried to understand the process of data aggregation that has been provided by MongoDB as such. We have also tried to understand the three variants of data aggregation provided by MongoDB. Understood also the pros and cons that each of these data aggregation operations brings with them as a feature.
One point to note is that the Aggregation Pipelines are the best possible way for one to perform data aggregation and yield the resultant output as expected.
Our work-support plans provide precise options as per your project tasks. Whether you are a newbie or an experienced professional seeking assistance in completing project tasks, we are here with the following plans to meet your custom needs:
Name | Dates | |
---|---|---|
MongoDB Training | Nov 19 to Dec 04 | View Details |
MongoDB Training | Nov 23 to Dec 08 | View Details |
MongoDB Training | Nov 26 to Dec 11 | View Details |
MongoDB Training | Nov 30 to Dec 15 | View Details |
Prasanthi is an expert writer in MongoDB, and has written for various reputable online and print publications. At present, she is working for MindMajix, and writes content not only on MongoDB, but also on Sharepoint, Uipath, and AWS.