Introduction
MongoDB's aggregation pipeline is the engine behind any non-trivial analytics query. It processes documents through a sequence of stages — each stage transforms the documents and passes them to the next. Think of it as a Unix pipe for documents: db.collection.aggregate([stage1, stage2, stage3, ...]).
The aggregation framework replaced the older mapReduce and is dramatically faster, more readable, and supports distributed execution in sharded clusters.
Sample Data
Throughout this guide we use an e-commerce database with orders and customers collections:
// orders collection
{
_id: ObjectId("..."),
customerId: ObjectId("..."),
status: "completed",
items: [
{ product: "Laptop", price: 999, qty: 1 },
{ product: "Mouse", price: 29, qty: 2 }
],
total: 1057,
createdAt: ISODate("2026-01-15")
}
// customers collection
{
_id: ObjectId("..."),
name: "Alice Chen",
email: "[email protected]",
country: "US",
tier: "premium"
}
$match — Filter Documents Early
$match uses standard MongoDB query syntax to filter documents. Always place it as early in the pipeline as possible to reduce the number of documents subsequent stages must process.
// Get completed orders in January 2026
db.orders.aggregate([
{
$match: {
status: "completed",
createdAt: {
$gte: ISODate("2026-01-01"),
$lt: ISODate("2026-02-01")
},
total: { $gte: 100 }
}
}
])
A leading $match on an indexed field can use the index, making the query as fast as a plain find().
$group — Aggregate by Key
$group collapses documents into groups. The _id field defines the grouping key; all other fields are computed with accumulator expressions.
// Revenue by country
db.orders.aggregate([
{ $match: { status: "completed" } },
{
$group: {
_id: "$country",
totalRevenue: { $sum: "$total" },
orderCount: { $sum: 1 },
avgOrderValue: { $avg: "$total" },
maxOrder: { $max: "$total" }
}
},
{ $sort: { totalRevenue: -1 } }
])
// Group by multiple fields
{
$group: {
_id: {
year: { $year: "$createdAt" },
month: { $month: "$createdAt" },
status: "$status"
},
count: { $sum: 1 },
revenue: { $sum: "$total" }
}
}
// Group all documents (grand total)
{
$group: {
_id: null,
grandTotal: { $sum: "$total" },
documentCount: { $sum: 1 }
}
}
$unwind — Flatten Arrays
$unwind deconstructs an array field: one input document with a 3-element array produces 3 output documents, one per array element. Essential for analyzing nested array data.
// Revenue per product across all orders
db.orders.aggregate([
{ $match: { status: "completed" } },
{
$unwind: {
path: "$items",
preserveNullAndEmpty: true // keep docs with empty/missing arrays
}
},
{
$group: {
_id: "$items.product",
totalRevenue: { $sum: { $multiply: ["$items.price", "$items.qty"] } },
unitsSold: { $sum: "$items.qty" }
}
},
{ $sort: { totalRevenue: -1 } },
{ $limit: 10 }
])
$lookup — Left Outer Join
$lookup joins documents from another collection — MongoDB's equivalent of SQL's LEFT JOIN.
// Basic lookup: join orders with customers
db.orders.aggregate([
{
$lookup: {
from: "customers", // collection to join
localField: "customerId", // field in orders
foreignField: "_id", // field in customers
as: "customer" // output array field
}
},
// customer is an array — unwind to get a single document
{ $unwind: "$customer" }
])
// Advanced lookup with pipeline (MongoDB 3.6+)
{
$lookup: {
from: "customers",
let: { cid: "$customerId", minTotal: "$total" },
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ["$_id", "$$cid"] },
{ $eq: ["$tier", "premium"] }
]
}
}
},
{ $project: { name: 1, email: 1, tier: 1 } }
],
as: "customer"
}
}
$project — Reshape Documents
// Include/exclude fields, compute new ones
db.orders.aggregate([
{
$project: {
_id: 0, // exclude
orderId: "$_id", // rename
status: 1, // include
total: 1,
itemCount: { $size: "$items" }, // computed
year: { $year: "$createdAt" },
isLargeOrder: { $gt: ["$total", 500] }
}
}
])
$facet — Multiple Pipelines in One Query
$facet runs multiple sub-pipelines on the same input simultaneously — perfect for dashboards that need several aggregations at once:
db.orders.aggregate([
{ $match: { status: "completed" } },
{
$facet: {
// Facet 1: status distribution
byStatus: [
{ $group: { _id: "$status", count: { $sum: 1 } } }
],
// Facet 2: revenue by month
byMonth: [
{
$group: {
_id: { $dateToString: { format: "%Y-%m", date: "$createdAt" } },
revenue: { $sum: "$total" }
}
},
{ $sort: { _id: 1 } }
],
// Facet 3: top products
topProducts: [
{ $unwind: "$items" },
{ $group: { _id: "$items.product", total: { $sum: "$items.qty" } } },
{ $sort: { total: -1 } },
{ $limit: 5 }
]
}
}
])
Complete Example — Monthly Revenue Report
db.orders.aggregate([
// 1. Filter to last 12 months
{
$match: {
createdAt: { $gte: new Date(new Date().setMonth(new Date().getMonth() - 12)) },
status: "completed"
}
},
// 2. Join with customers
{
$lookup: {
from: "customers",
localField: "customerId",
foreignField: "_id",
as: "customer"
}
},
{ $unwind: "$customer" },
// 3. Group by month + customer tier
{
$group: {
_id: {
month: { $dateToString: { format: "%Y-%m", date: "$createdAt" } },
tier: "$customer.tier"
},
revenue: { $sum: "$total" },
orders: { $sum: 1 },
uniqueCustomers: { $addToSet: "$customerId" }
}
},
// 4. Add computed fields
{
$addFields: {
uniqueCustomerCount: { $size: "$uniqueCustomers" },
avgOrderValue: { $divide: ["$revenue", "$orders"] }
}
},
// 5. Clean up
{ $project: { uniqueCustomers: 0 } },
{ $sort: { "_id.month": 1, "_id.tier": 1 } }
])
Performance Tips
- $match first — filter early to reduce document count before expensive stages
- $match on indexed fields — the pipeline optimizer can use indexes for leading $match stages
- $project early — drop unused fields before $sort or $group to reduce memory
- allowDiskUse: true — for large aggregations exceeding the 100MB memory limit
- Use explain() —
db.collection.aggregate([...], {explain: true})reveals the execution plan
// Enable disk use for large aggregations
db.orders.aggregate(pipeline, { allowDiskUse: true })
// Explain the pipeline
db.orders.explain("executionStats").aggregate(pipeline)
DevKits Tools for MongoDB Development
When working with MongoDB data, these DevKits tools can help:
- JSON Formatter — format and validate MongoDB query results and documents
- JSON to CSV Converter — convert aggregation results for spreadsheet analysis
Summary
The MongoDB aggregation pipeline is one of the most powerful query systems in any database. The essential stages:
$match— filter (always first, always indexed)$group— aggregate with accumulators ($sum, $avg, $max, $addToSet)$unwind— flatten arrays for per-element analysis$lookup— join with other collections$project— reshape and compute new fields$facet— parallel sub-pipelines for dashboards