MongoDB Aggregation Pipeline — $match, $group, $lookup, $unwind Explained

Master MongoDB aggregation pipeline stages with real-world examples. Covers $match, $group, $lookup, $unwind, $project, $facet, and performance optimization techniques.

Introduction

MongoDB's aggregation pipeline is the engine behind any non-trivial analytics query. It processes documents through a sequence of stages — each stage transforms the documents and passes them to the next. Think of it as a Unix pipe for documents: db.collection.aggregate([stage1, stage2, stage3, ...]).

The aggregation framework replaced the older mapReduce and is dramatically faster, more readable, and supports distributed execution in sharded clusters.

Sample Data

Throughout this guide we use an e-commerce database with orders and customers collections:

// orders collection
{
  _id: ObjectId("..."),
  customerId: ObjectId("..."),
  status: "completed",
  items: [
    { product: "Laptop", price: 999, qty: 1 },
    { product: "Mouse", price: 29, qty: 2 }
  ],
  total: 1057,
  createdAt: ISODate("2026-01-15")
}

// customers collection
{
  _id: ObjectId("..."),
  name: "Alice Chen",
  email: "[email protected]",
  country: "US",
  tier: "premium"
}

$match — Filter Documents Early

$match uses standard MongoDB query syntax to filter documents. Always place it as early in the pipeline as possible to reduce the number of documents subsequent stages must process.

// Get completed orders in January 2026
db.orders.aggregate([
  {
    $match: {
      status: "completed",
      createdAt: {
        $gte: ISODate("2026-01-01"),
        $lt: ISODate("2026-02-01")
      },
      total: { $gte: 100 }
    }
  }
])

A leading $match on an indexed field can use the index, making the query as fast as a plain find().

$group — Aggregate by Key

$group collapses documents into groups. The _id field defines the grouping key; all other fields are computed with accumulator expressions.

// Revenue by country
db.orders.aggregate([
  { $match: { status: "completed" } },
  {
    $group: {
      _id: "$country",
      totalRevenue: { $sum: "$total" },
      orderCount: { $sum: 1 },
      avgOrderValue: { $avg: "$total" },
      maxOrder: { $max: "$total" }
    }
  },
  { $sort: { totalRevenue: -1 } }
])

// Group by multiple fields
{
  $group: {
    _id: {
      year: { $year: "$createdAt" },
      month: { $month: "$createdAt" },
      status: "$status"
    },
    count: { $sum: 1 },
    revenue: { $sum: "$total" }
  }
}

// Group all documents (grand total)
{
  $group: {
    _id: null,
    grandTotal: { $sum: "$total" },
    documentCount: { $sum: 1 }
  }
}

$unwind — Flatten Arrays

$unwind deconstructs an array field: one input document with a 3-element array produces 3 output documents, one per array element. Essential for analyzing nested array data.

// Revenue per product across all orders
db.orders.aggregate([
  { $match: { status: "completed" } },
  {
    $unwind: {
      path: "$items",
      preserveNullAndEmpty: true  // keep docs with empty/missing arrays
    }
  },
  {
    $group: {
      _id: "$items.product",
      totalRevenue: { $sum: { $multiply: ["$items.price", "$items.qty"] } },
      unitsSold: { $sum: "$items.qty" }
    }
  },
  { $sort: { totalRevenue: -1 } },
  { $limit: 10 }
])

$lookup — Left Outer Join

$lookup joins documents from another collection — MongoDB's equivalent of SQL's LEFT JOIN.

// Basic lookup: join orders with customers
db.orders.aggregate([
  {
    $lookup: {
      from: "customers",           // collection to join
      localField: "customerId",    // field in orders
      foreignField: "_id",         // field in customers
      as: "customer"               // output array field
    }
  },
  // customer is an array — unwind to get a single document
  { $unwind: "$customer" }
])

// Advanced lookup with pipeline (MongoDB 3.6+)
{
  $lookup: {
    from: "customers",
    let: { cid: "$customerId", minTotal: "$total" },
    pipeline: [
      {
        $match: {
          $expr: {
            $and: [
              { $eq: ["$_id", "$$cid"] },
              { $eq: ["$tier", "premium"] }
            ]
          }
        }
      },
      { $project: { name: 1, email: 1, tier: 1 } }
    ],
    as: "customer"
  }
}

$project — Reshape Documents

// Include/exclude fields, compute new ones
db.orders.aggregate([
  {
    $project: {
      _id: 0,                           // exclude
      orderId: "$_id",                  // rename
      status: 1,                        // include
      total: 1,
      itemCount: { $size: "$items" },   // computed
      year: { $year: "$createdAt" },
      isLargeOrder: { $gt: ["$total", 500] }
    }
  }
])

$facet — Multiple Pipelines in One Query

$facet runs multiple sub-pipelines on the same input simultaneously — perfect for dashboards that need several aggregations at once:

db.orders.aggregate([
  { $match: { status: "completed" } },
  {
    $facet: {
      // Facet 1: status distribution
      byStatus: [
        { $group: { _id: "$status", count: { $sum: 1 } } }
      ],
      // Facet 2: revenue by month
      byMonth: [
        {
          $group: {
            _id: { $dateToString: { format: "%Y-%m", date: "$createdAt" } },
            revenue: { $sum: "$total" }
          }
        },
        { $sort: { _id: 1 } }
      ],
      // Facet 3: top products
      topProducts: [
        { $unwind: "$items" },
        { $group: { _id: "$items.product", total: { $sum: "$items.qty" } } },
        { $sort: { total: -1 } },
        { $limit: 5 }
      ]
    }
  }
])

Complete Example — Monthly Revenue Report

db.orders.aggregate([
  // 1. Filter to last 12 months
  {
    $match: {
      createdAt: { $gte: new Date(new Date().setMonth(new Date().getMonth() - 12)) },
      status: "completed"
    }
  },
  // 2. Join with customers
  {
    $lookup: {
      from: "customers",
      localField: "customerId",
      foreignField: "_id",
      as: "customer"
    }
  },
  { $unwind: "$customer" },
  // 3. Group by month + customer tier
  {
    $group: {
      _id: {
        month: { $dateToString: { format: "%Y-%m", date: "$createdAt" } },
        tier: "$customer.tier"
      },
      revenue: { $sum: "$total" },
      orders: { $sum: 1 },
      uniqueCustomers: { $addToSet: "$customerId" }
    }
  },
  // 4. Add computed fields
  {
    $addFields: {
      uniqueCustomerCount: { $size: "$uniqueCustomers" },
      avgOrderValue: { $divide: ["$revenue", "$orders"] }
    }
  },
  // 5. Clean up
  { $project: { uniqueCustomers: 0 } },
  { $sort: { "_id.month": 1, "_id.tier": 1 } }
])

Performance Tips

  • $match first — filter early to reduce document count before expensive stages
  • $match on indexed fields — the pipeline optimizer can use indexes for leading $match stages
  • $project early — drop unused fields before $sort or $group to reduce memory
  • allowDiskUse: true — for large aggregations exceeding the 100MB memory limit
  • Use explain()db.collection.aggregate([...], {explain: true}) reveals the execution plan
// Enable disk use for large aggregations
db.orders.aggregate(pipeline, { allowDiskUse: true })

// Explain the pipeline
db.orders.explain("executionStats").aggregate(pipeline)

DevKits Tools for MongoDB Development

When working with MongoDB data, these DevKits tools can help:

Summary

The MongoDB aggregation pipeline is one of the most powerful query systems in any database. The essential stages:

  • $match — filter (always first, always indexed)
  • $group — aggregate with accumulators ($sum, $avg, $max, $addToSet)
  • $unwind — flatten arrays for per-element analysis
  • $lookup — join with other collections
  • $project — reshape and compute new fields
  • $facet — parallel sub-pipelines for dashboards