Aggregation Framework
Master the MongoDB aggregation framework — a powerful data processing pipeline. Learn $match, $group, $project, $sort, $unwind, $lookup (joins), $addFields, and $bucket for analytics, reporting, and complex data transformations.
The aggregation framework is MongoDB's answer to SQL's GROUP BY, JOIN, and subqueries — but far more powerful. It processes data through a pipeline of stages, where each stage transforms the data and passes it to the next. Think of it as a data processing assembly line.
Pipeline Concept
db.collection.aggregate([
{ $match: { ... } }, // Stage 1: Filter
{ $group: { ... } }, // Stage 2: Group
{ $sort: { ... } }, // Stage 3: Sort
{ $limit: 10 } // Stage 4: Limit
])
Data flows through each stage sequentially. Each stage receives the output of the previous stage as input.
$match — Filter Documents
// Filter first for performance (reduces data for later stages)
db.orders.aggregate([
{ $match: { status: "completed", total: { $gte: 1000 } } }
])
$group — Group and Aggregate
// Total sales per category
db.orders.aggregate([
{
$group: {
_id: "$category", // Group by category
totalSales: { $sum: "$total" },
avgOrder: { $avg: "$total" },
orderCount: { $sum: 1 },
maxOrder: { $max: "$total" },
minOrder: { $min: "$total" }
}
}
])
// Group by multiple fields
db.orders.aggregate([
{
$group: {
_id: { category: "$category", year: { $year: "$createdAt" } },
totalSales: { $sum: "$total" },
count: { $sum: 1 }
}
}
])
// Group all documents (no grouping field)
db.orders.aggregate([
{
$group: {
_id: null, // No grouping
totalRevenue: { $sum: "$total" },
totalOrders: { $sum: 1 }
}
}
])
$project — Reshape Documents
db.users.aggregate([
{
$project: {
_id: 0,
fullName: { $concat: ["$firstName", " ", "$lastName"] },
email: 1,
ageGroup: {
$switch: {
branches: [
{ case: { $lt: ["$age", 18] }, then: "Minor" },
{ case: { $lt: ["$age", 30] }, then: "Young Adult" },
{ case: { $lt: ["$age", 50] }, then: "Adult" }
],
default: "Senior"
}
}
}
}
])
$sort and $limit
// Top 5 customers by total spending
db.orders.aggregate([
{ $group: {
_id: "$customerId",
totalSpent: { $sum: "$total" }
}},
{ $sort: { totalSpent: -1 } },
{ $limit: 5 }
])
$unwind — Flatten Arrays
// Flatten the items array to analyze individual products
db.orders.aggregate([
{ $unwind: "$items" },
{ $group: {
_id: "$items.product",
totalQuantity: { $sum: "$items.qty" },
totalRevenue: { $sum: { $multiply: ["$items.price", "$items.qty"] } }
}},
{ $sort: { totalRevenue: -1 } }
])
$lookup — Join Collections
// Join orders with customer details
db.orders.aggregate([
{
$lookup: {
from: "users", // Collection to join with
localField: "customerId", // Field in orders
foreignField: "_id", // Field in users
as: "customerDetails" // Output array field
}
},
{ $unwind: "$customerDetails" }, // Flatten the joined array
{
$project: {
orderNumber: 1,
total: 1,
"customerDetails.name": 1,
"customerDetails.email": 1
}
}
])
$addFields — Add Computed Fields
db.products.aggregate([
{
$addFields: {
discountedPrice: {
$multiply: ["$price", { $subtract: [1, { $divide: ["$discount", 100] }] }]
},
isExpensive: { $gte: ["$price", 1000] }
}
}
])
$bucket — Group into Ranges
// Group products by price ranges
db.products.aggregate([
{
$bucket: {
groupBy: "$price",
boundaries: [0, 100, 500, 1000, 5000, Infinity],
default: "Other",
output: {
count: { $sum: 1 },
products: { $push: "$name" },
avgPrice: { $avg: "$price" }
}
}
}
])
Real-World Example: Sales Dashboard
db.orders.aggregate([
// Filter completed orders from this year
{ $match: {
status: "completed",
createdAt: { $gte: new Date("2024-01-01") }
}},
// Group by month
{ $group: {
_id: { $month: "$createdAt" },
revenue: { $sum: "$total" },
orders: { $sum: 1 },
avgOrderValue: { $avg: "$total" }
}},
// Add month name
{ $addFields: {
month: {
$arrayElemAt: [
["", "Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"],
"$_id"
]
}
}},
// Sort by month
{ $sort: { _id: 1 } },
// Clean up output
{ $project: { _id: 0, month: 1, revenue: 1, orders: 1, avgOrderValue: { $round: ["$avgOrderValue", 2] } } }
])
What's Next
The aggregation framework is incredibly powerful for analytics and reporting. In the next episode, we'll cover backup and restore operations — essential skills for protecting your data in production environments.