MongoDB Aggregation Framework: $match and $project Stages
In the introductory post we discussed the fundamentals of MongoDB's Aggregation Framework, and how it works by applying a number of stages (or steps) in a pipeline to achieve the desired result.
Unlike a single MongoDB query, utilizing the Aggregation Framework provides multiple interaction points with your data, offering various chances for data manipulation within a single operation.
Aggregation Framework queries run "server side" (or within your database). This means you don't need to send data back and forth between the database and your client, which will make your operations faster.
In this post we will discuss arguably the two "workhorse" stages when it comes to filtering documents – the $match
and $project
stages.
These versatile stages allows you to cherry-pick documents (and parts of documents) that match specific criteria, effectively streamlining your data processing and enhancing the efficiency of your pipelines.
Filtering Documents with the $match Stage
The $match
stage is akin to a gatekeeper within the MongoDB aggregation pipeline. It lets you decide which documents are allowed to continue their journey through the pipeline and which are left behind. This filtering is done using standard MongoDB queries, making it an incredibly flexible and indispensable tool.
Typically you will want to run
$match
stages earlier on in your pipelines to reduce the amount of documents you are working with.
Let's start with a simple example. Suppose you have a collection of recipes called "cookbook". Here is what an example of what one of those documents might look like:
{ "_id": { "$oid": "636aa9707dd21c28fda493a3" }, "title": "Toast", "calories_per_serving": 75, "prep_time": 1, "cook_time": 4, "ingredients": [ { "name": "bread", "quantity": { "amount": 4, "unit": "slice" }, "vegetarian": true }, { "name": "butter", "quantity": { "amount": 2, "unit": "tablespoon" }, "vegetarian": true } ], "directions": [ "Toast bread.", "When both sides are an even golden brown, butter one side, care being taken to butter the edges.", "Melt butter.", "Serve hot." ], "rating": [ 5 ], "rating_avg": 5, "servings": 4, "tags": [ "bread", "quick", "vegetarian" ], "type": "Breakfast", "vegetarian_option": true }
Now, you want to extract only those recipes that have a rating
field. Given that, here's how you can achieve this using the $match
stage:
db.cookbook.aggregate([ { "$match": { "rating": { "$exists": true } } } ])
In this example, we filter out recipes that lack a rating
field by using the $exists
operator.
The result will be a collection of recipes excluding any document that doesn't meet the specified criteria.
Going Deeper: Multiple Criteria Filtering
The $match
stage isn't limited to a single criterion. You can combine multiple conditions to refine your document selection. For instance, suppose you want to find "Breakfast" recipes with a vegetarian option and a rating. Here's how you can construct your query:
db.cookbook.aggregate([ { "$match": { "rating": { "$exists": true }, "type": "Breakfast", "vegetarian_option": true } }, { "$project": { "_id": 0, "title": 1, "avgRating": { "$round": [{ "$avg": "$rating" }, 2] } } } ])
In this example, we filter recipes that meet three criteria: they must have a rating
, belong to the "Breakfast" type
, and offer a vegetarian option.
We then have another stage $project
, which will further alter our results.
Shaping Results With the $project
Stage
The MongoDB $project
stage is used to reshape the documents in a MongoDB collection, typically by specifying which fields to include or exclude from the output documents and performing various transformations on the data within those fields.
In the provided $project
stage, it is performing the following actions:
_id: 0
: This indicates that the_id
field will be excluded from the output documents. By setting it to0
, you are telling MongoDB not to include the_id
field in the projected documents."title": 1
: This includes thetitle
field in the output documents and sets its value to1
, indicating that you want to keep this field in the projected documents."avgRating": { "$round": [{ "$avg": "$rating" }, 2] }
: This is creating a new field calledavgRating
in the output documents.
The value of this field is calculated by first using the $avg
aggregation operator to calculate the average value of the rating
field within the documents. Then, it uses the $round
aggregation operator to round the calculated average rating to two decimal places (with a precision of 2). So, the avgRating
field will contain the average rating rounded to two decimal places.
In short, this
$project
stage will produce output documents that do not include the_id
field, include thetitle
field, and include a new fieldavgRating
that contains the averagerating
rounded to two decimal places based on the values in therating
field of the input documents.
Conclusion
The $match
and $project
stages are an invaluable tool in your MongoDB aggregation arsenal. It empowers you to filter documents based on your specific criteria, allowing you to focus on the data that matters most.
Whether you're extracting documents with certain fields, multiple conditions, or anything in between, the $match
and $project
stages streamline your data pipelines and enhances your MongoDB experience.