Episode 13 of 17

Aggregation Framework

Master the MongoDB aggregation framework — a powerful data processing pipeline. Learn $match, $group, $project, $sort, $unwind, $lookup (joins), $addFields, and $bucket for analytics, reporting, and complex data transformations.

The aggregation framework is MongoDB's answer to SQL's GROUP BY, JOIN, and subqueries — but far more powerful. It processes data through a pipeline of stages, where each stage transforms the data and passes it to the next. Think of it as a data processing assembly line.

Pipeline Concept

db.collection.aggregate([
    { $match: { ... } },     // Stage 1: Filter
    { $group: { ... } },     // Stage 2: Group
    { $sort: { ... } },      // Stage 3: Sort
    { $limit: 10 }           // Stage 4: Limit
])

Data flows through each stage sequentially. Each stage receives the output of the previous stage as input.

$match — Filter Documents

// Filter first for performance (reduces data for later stages)
db.orders.aggregate([
    { $match: { status: "completed", total: { $gte: 1000 } } }
])

$group — Group and Aggregate

// Total sales per category
db.orders.aggregate([
    {
        $group: {
            _id: "$category",              // Group by category
            totalSales: { $sum: "$total" },
            avgOrder: { $avg: "$total" },
            orderCount: { $sum: 1 },
            maxOrder: { $max: "$total" },
            minOrder: { $min: "$total" }
        }
    }
])

// Group by multiple fields
db.orders.aggregate([
    {
        $group: {
            _id: { category: "$category", year: { $year: "$createdAt" } },
            totalSales: { $sum: "$total" },
            count: { $sum: 1 }
        }
    }
])

// Group all documents (no grouping field)
db.orders.aggregate([
    {
        $group: {
            _id: null,                      // No grouping
            totalRevenue: { $sum: "$total" },
            totalOrders: { $sum: 1 }
        }
    }
])

$project — Reshape Documents

db.users.aggregate([
    {
        $project: {
            _id: 0,
            fullName: { $concat: ["$firstName", " ", "$lastName"] },
            email: 1,
            ageGroup: {
                $switch: {
                    branches: [
                        { case: { $lt: ["$age", 18] }, then: "Minor" },
                        { case: { $lt: ["$age", 30] }, then: "Young Adult" },
                        { case: { $lt: ["$age", 50] }, then: "Adult" }
                    ],
                    default: "Senior"
                }
            }
        }
    }
])

$sort and $limit

// Top 5 customers by total spending
db.orders.aggregate([
    { $group: {
        _id: "$customerId",
        totalSpent: { $sum: "$total" }
    }},
    { $sort: { totalSpent: -1 } },
    { $limit: 5 }
])

$unwind — Flatten Arrays

// Flatten the items array to analyze individual products
db.orders.aggregate([
    { $unwind: "$items" },
    { $group: {
        _id: "$items.product",
        totalQuantity: { $sum: "$items.qty" },
        totalRevenue: { $sum: { $multiply: ["$items.price", "$items.qty"] } }
    }},
    { $sort: { totalRevenue: -1 } }
])

$lookup — Join Collections

// Join orders with customer details
db.orders.aggregate([
    {
        $lookup: {
            from: "users",               // Collection to join with
            localField: "customerId",    // Field in orders
            foreignField: "_id",         // Field in users
            as: "customerDetails"        // Output array field
        }
    },
    { $unwind: "$customerDetails" },     // Flatten the joined array
    {
        $project: {
            orderNumber: 1,
            total: 1,
            "customerDetails.name": 1,
            "customerDetails.email": 1
        }
    }
])

$addFields — Add Computed Fields

db.products.aggregate([
    {
        $addFields: {
            discountedPrice: {
                $multiply: ["$price", { $subtract: [1, { $divide: ["$discount", 100] }] }]
            },
            isExpensive: { $gte: ["$price", 1000] }
        }
    }
])

$bucket — Group into Ranges

// Group products by price ranges
db.products.aggregate([
    {
        $bucket: {
            groupBy: "$price",
            boundaries: [0, 100, 500, 1000, 5000, Infinity],
            default: "Other",
            output: {
                count: { $sum: 1 },
                products: { $push: "$name" },
                avgPrice: { $avg: "$price" }
            }
        }
    }
])

Real-World Example: Sales Dashboard

db.orders.aggregate([
    // Filter completed orders from this year
    { $match: {
        status: "completed",
        createdAt: { $gte: new Date("2024-01-01") }
    }},

    // Group by month
    { $group: {
        _id: { $month: "$createdAt" },
        revenue: { $sum: "$total" },
        orders: { $sum: 1 },
        avgOrderValue: { $avg: "$total" }
    }},

    // Add month name
    { $addFields: {
        month: {
            $arrayElemAt: [
                ["", "Jan", "Feb", "Mar", "Apr", "May", "Jun",
                 "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"],
                "$_id"
            ]
        }
    }},

    // Sort by month
    { $sort: { _id: 1 } },

    // Clean up output
    { $project: { _id: 0, month: 1, revenue: 1, orders: 1, avgOrderValue: { $round: ["$avgOrderValue", 2] } } }
])

What's Next

The aggregation framework is incredibly powerful for analytics and reporting. In the next episode, we'll cover backup and restore operations — essential skills for protecting your data in production environments.

TutorialMongoDBDatabaseNoSQL