MongoDB Aggregation Pipelines: Move Logic Into the DB

The In-Memory Trap Every MERN Developer Falls Into

It starts with something reasonable. You need to show a sales summary on a dashboard — total orders this month, average order value, a breakdown by status. So you write what feels natural:

const orders = await Order.find({ status: { $in: ['completed', 'delivered'] } })
const total = orders.reduce((sum, o) => sum + o.total, 0)
const avg = total / orders.length

This works at a hundred orders. It gets uncomfortable at ten thousand. At a hundred thousand, your Node process is allocating hundreds of megabytes for a query that should return three numbers. You've turned your application server into a secondary database engine — and a slow one at that.

The pattern shows up everywhere in early MERN codebases: fetch a big collection, sort or filter it in JavaScript, send the result downstream. Each layer adds latency and memory pressure that accumulates invisibly until it doesn't. The fix isn't faster hardware or a different ORM. The fix is to stop asking MongoDB for raw documents and start asking it for answers. That's what the aggregation pipeline is for.

What the Pipeline Actually Is

MongoDB's aggregation pipeline is a sequence of transformation stages. Documents flow in at one end, each stage filters or reshapes them, and what emerges at the other end is exactly what your application needs — no further processing required in Node.

The key is that this computation runs inside the database, close to the data, where MongoDB can leverage indexes, parallel execution, and efficient cursor-based streaming. By the time the result crosses the network to your application server, it's already the right shape.

The core stages and what they do:

$match — filters documents using the same query syntax as .find(), and equally index-aware
$group — aggregates by a key, with accumulators like $sum, $avg, $min, $max, $push
$project — reshapes documents: rename fields, add computed fields, drop what you don't need
$sort / $limit / $skip — ordering and pagination
$lookup — a left outer join from another collection
$facet — runs multiple sub-pipelines in a single pass, useful for pagination with totals
$addFields — adds computed fields without dropping existing ones

Stages compose. The output of $match feeds into $group, which feeds into $project. MongoDB's query planner optimizes the execution order where it can — in particular, it will push $match and $sort stages earlier in the pipeline when doing so allows index use.

A Real Example: Daily Revenue Stats

Here's the aggregation that replaces the naive in-memory reduce from above. It returns daily revenue totals for a date range — the kind of data that feeds the dashboard in something like OrderX:

const startOfMonth = new Date(year, month, 1)
const endOfMonth = new Date(year, month + 1, 0, 23, 59, 59)

const dailyStats = await Order.aggregate([
  {
    $match: {
      createdAt: { $gte: startOfMonth, $lte: endOfMonth },
      status: { $in: ['completed', 'delivered'] },
    },
  },
  {
    $group: {
      _id: { $dateToString: { format: '%Y-%m-%d', date: '$createdAt' } },
      revenue: { $sum: '$total' },
      orderCount: { $sum: 1 },
      avgOrderValue: { $avg: '$total' },
    },
  },
  {
    $project: {
      _id: 0,
      date: '$_id',
      revenue: 1,
      orderCount: 1,
      avgOrderValue: { $round: ['$avgOrderValue', 2] },
    },
  },
  { $sort: { date: 1 } },
])

MongoDB executes $match first — and if createdAt is indexed, it uses that index to scan only the relevant documents. The $group aggregates entirely in the database. What travels over the wire is an array of thirty-odd objects, not ten thousand order documents. That difference compounds with every request.

Notice $round inside $project: you can apply arithmetic, string formatting, and date transformations directly in the pipeline. Any computation you'd otherwise do in JavaScript after the query can usually be expressed here.

Joining Collections Without N+1

The other place in-memory work creeps in is relational queries. You have a products collection and an orders collection. Each order line has a productId. You need the product name alongside each order.

The tempting path: fetch all orders, then loop and call Product.findById() for each one. That's an N+1 — one extra query per document, and each round trip adds latency.

$lookup solves this in a single query:

const ordersWithProducts = await Order.aggregate([
  { $match: { userId: new mongoose.Types.ObjectId(userId) } },
  {
    $lookup: {
      from: 'products',
      localField: 'productId',
      foreignField: '_id',
      as: 'product',
    },
  },
  { $unwind: '$product' },
  {
    $project: {
      _id: 1,
      total: 1,
      createdAt: 1,
      'product.name': 1,
      'product.sku': 1,
    },
  },
])

One query. MongoDB performs the join server-side. Note the $unwind after $lookup — the lookup stage always produces an array for the joined field, even for a one-to-one relationship. $unwind flattens it to a scalar so you can reference product.name directly in subsequent stages.

For one-to-many relationships where you want all matched items as an array, skip the $unwind and you get the array behavior naturally.

Pagination With Totals in One Round Trip

Standard MERN pagination typically means two queries: one for the page of data, one for the total document count. $facet lets you run both sub-pipelines simultaneously against the same filtered set:

const [result] = await Product.aggregate([
  { $match: { category, inStock: true } },
  {
    $facet: {
      data: [
        { $sort: { createdAt: -1 } },
        { $skip: (page - 1) * pageSize },
        { $limit: pageSize },
      ],
      totalCount: [{ $count: 'count' }],
    },
  },
])

const { data, totalCount } = result
const total = totalCount[0]?.count ?? 0

Both branches share the same $match — MongoDB doesn't re-scan the collection for each facet. One database round trip, two results.

Using Aggregations With Mongoose

Mongoose's .aggregate() method accepts the same pipeline array you'd pass to the raw driver. One thing to know: Mongoose does not apply schema transforms, virtuals, or getter functions to aggregation results. What comes back is plain JavaScript objects shaped entirely by your $project stage.

For dashboard queries and reporting endpoints, that's almost always what you want — you're building view models, not working with Mongoose document instances. Just don't expect .toJSON() transforms or instance methods on the result. Keep aggregation outputs as plain data; keep your Mongoose document API for single-document CRUD where you need it.

One practical note on indexes: $match at the start of a pipeline uses indexes exactly as .find() does. If you're running aggregations on large collections, make sure the fields you filter on early are indexed. An unindexed $match on a million-document collection is just as slow as an unindexed find.

Takeaway

The aggregation pipeline is how MongoDB is meant to be used for anything beyond a simple read. If your application code is filtering, sorting, summarizing, joining, or paginating data that came back from a .find() — you have a pipeline waiting to be written. The database is faster at it, your Node process uses less memory, and the intent of the query is often clearer than a chain of .map() and .reduce() calls applied to the wrong data layer.

Start with $match and $group. The rest follows naturally from there.