MongoDB's Aggregation Framework
No matter how well-designed your database schema is, there will always be cases where you need more flexibility. Even with MongoDB's powerful query engine you might find some complex scenarios that demand more, enter the Aggregation Framework.
The Aggregation Framework provides a structured way to manipulate and process data, allowing you to extract precisely the information you need.
Aggregation Pipelines
At the core of the Aggregation Framework are "pipelines" composed of "stages". While there are many distinct stages available, they can be categorized into four main actions: filtering, transforming, grouping, and sorting.
Each stage uses $
and a stage type to define what it is doing, for example $limit
, $sort
or $match
.
The order of these stages typically doesn't matter because MongoDB's query optimizer strives to optimize each step.
However, some best practices can help streamline your pipeline, such as using
$match
earlier on, to reduce the total number of documents
When constructing your pipeline, start by defining what you want in the result and break it down into stages.
For instance, the first stage might filter documents based on specific conditions, like matching a field to a certain string or checking if a field is greater than or equal to a given value.
Subsequent stages can sort the documents or control which fields are included in the final output.
Building a Pipeline
Each stage in the aggregation pipeline is specified as an object in an array. You can use the same stage type multiple times if needed. Instead of using the find()
command for querying, you'll employ the aggregate()
method like this:
> db.collection.aggregate([stageOne, stageTwo])
Your pipeline can consist of just one stage, or as many as
1,000
with memory being the primary constraint.By default, MongoDB limits memory usage to
100
megabytes unless you permit the query to use disk, which is significantly slower.
Projecting Aggregated Fields
You may already know about the concept of project
within MongoDB, which is used in find()
to control which fields are returned in query results.
With the Aggregation Framework, you can go beyond this and create new fields by aggregating or composing existing ones using the $project
stage.
Consider a scenario where you have recipe documents containing a rating array:
rating: [4, 2, 3, 3, 4, 5, 1, 2]
To obtain the average rating for each recipe, you can use two operators: $project
and $avg
which will calculate the average of the numbers in the rating array.
The result can be assigned to a new field, avgRating
using a query like this:
> db.cookbook.aggregate([ { "$project": { "avgRating": { "$avg": "$rating" } } } ])
This query specifies which field to average using $rating
to match the rating
field.
The result includes the _id
and a new field, avgRating
, which contains the average rating for each document. The results might end up like this:
[ { _id: ObjectId("636821387dd21c28fda4939f"), avgRating: 3.7142857142857144 }, { _id: ObjectId("636aa92f7dd21c28fda493a0"), avgRating: 3.888888888888889 }, { _id: ObjectId("636aa94c7dd21c28fda493a1"), avgRating: 4.777777777777778 }, { _id: ObjectId("636aa9617dd21c28fda493a2"), avgRating: 3.888888888888889 }, { _id: ObjectId("636aa9707dd21c28fda493a3"), avgRating: 5 }, { _id: ObjectId("636aa9817dd21c28fda493a4"), avgRating: 4.357142857142857 }, { _id: ObjectId("636ab56e956f91c56f02f049"), avgRating: null } ]
It's important to note that this query output represents the result of the query and does not modify the underlying documents.
You can then take a things a step further by adding more stages (here to sort our results by the average rating, with the highest first):
> db.cookbook.aggregate([ { "$project": { "avgRating": { "$avg": "$rating" } } }, { "$sort": { "avgRating": -1 } } ])
While quotes around fields are technically optional in most cases, it's recommended to include them to ensure valid JSON formatting and make query syntax validation more straightforward.
MongoDB's Aggregation Framework empowers you to manipulate and process data with precision. By understanding the framework's typical pipelines, complex possibilities, and additional use cases, you'll be better equipped to leverage its capabilities in your applications.
So, go ahead and start to use the power of aggregation to extract the exact data you need from your MongoDB databases!