SQL remains essential for structured data. This tutorial covers advanced joins, aggregations, CTEs and indexes in PostgreSQL, then shows how to model the same data in MongoDB.
A simple e-commerce system: users, orders, products and order items.
-- Users
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
country VARCHAR(50) DEFAULT 'US',
created_at TIMESTAMP DEFAULT NOW()
);
-- Product categories
CREATE TABLE categories (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
parent_id INT REFERENCES categories(id) -- Self-reference for hierarchy
);
-- Products
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(200) NOT NULL,
price DECIMAL(10,2) NOT NULL,
stock INT DEFAULT 0,
category_id INT REFERENCES categories(id)
);
-- Orders
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL REFERENCES users(id),
status VARCHAR(20) DEFAULT 'pending',
created_at TIMESTAMP DEFAULT NOW(),
shipped_at TIMESTAMP
);
-- Order items
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INT NOT NULL REFERENCES orders(id),
product_id INT NOT NULL REFERENCES products(id),
quantity INT NOT NULL,
unit_price DECIMAL(10,2) NOT NULL -- Price at time of purchase
);
Definition: Merging data from multiple tables in a single query, based on a relationship key.
Purpose: Fetch related data without making multiple sequential queries.
Why here: INNER JOIN returns only rows with matches in both tables. LEFT JOIN returns all rows from the left table, even without matches on the right (filled with NULL).
-- โโ INNER JOIN: orders with user details โโ
SELECT
o.id AS order_id,
u.name AS customer,
u.email,
o.status,
o.created_at,
COUNT(oi.id) AS items_count,
SUM(oi.quantity * oi.unit_price) AS total
FROM orders o
INNER JOIN users u ON o.user_id = u.id
INNER JOIN order_items oi ON oi.order_id = o.id
WHERE o.created_at >= NOW() - INTERVAL '30 days'
GROUP BY o.id, u.name, u.email, o.status, o.created_at
ORDER BY total DESC;
-- โโ LEFT JOIN: users WITH or WITHOUT orders โโ
SELECT
u.name,
u.email,
COUNT(o.id) AS total_orders,
COALESCE(SUM(oi.quantity * oi.unit_price), 0) AS lifetime_value
FROM users u
LEFT JOIN orders o ON o.user_id = u.id AND o.status = 'completed'
LEFT JOIN order_items oi ON oi.order_id = o.id
GROUP BY u.id, u.name, u.email
ORDER BY lifetime_value DESC;
-- โโ SELF JOIN: parent/child categories โโ
SELECT
child.name AS subcategory,
parent.name AS parent_category
FROM categories child
LEFT JOIN categories parent ON child.parent_id = parent.id
ORDER BY parent.name, child.name;
Definition: Function that operates on a subset of rows (window) defined by PARTITION BY, while preserving individual rows (unlike GROUP BY which aggregates them).
Purpose: Calculate partial aggregations (rank, cumulative, previous value) without losing row-level details.
Why here: PARTITION BY divides rows into groups (e.g. by category), ORDER BY sorts them in each partition. RANK() ranks by descending sales, LAG() accesses the previous row.
-- โโ Ranking best-selling products by category โโ
SELECT
c.name AS category,
p.name AS product,
SUM(oi.quantity) AS total_sold,
-- RANK() โ same rank for ties
RANK() OVER (
PARTITION BY c.id
ORDER BY SUM(oi.quantity) DESC
) AS rank_in_category,
-- Cumulative revenue in category
SUM(SUM(oi.quantity * oi.unit_price)) OVER (
PARTITION BY c.id
ORDER BY SUM(oi.quantity) DESC
) AS cumulative_revenue
FROM products p
JOIN categories c ON p.category_id = c.id
JOIN order_items oi ON oi.product_id = p.id
GROUP BY c.id, c.name, p.id, p.name
ORDER BY c.name, rank_in_category;
-- โโ Month-over-month revenue evolution with variance โโ
WITH monthly_revenue AS (
SELECT
DATE_TRUNC('month', o.created_at) AS month,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.status = 'completed'
GROUP BY 1
)
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
ROUND(
(revenue - LAG(revenue) OVER (ORDER BY month))
/ LAG(revenue) OVER (ORDER BY month) * 100, 2
) AS growth_pct
FROM monthly_revenue
ORDER BY month;
Definition: Named query (WITH clause) that calls itself, enabling processing of hierarchical or graph structures.
Purpose: Traverse data trees (parent/child categories, org charts) without complex application code.
Why here: A recursive CTE has two parts: the base case (root categories) and the recursive case (children of each parent).
-- โโ Traverse a category hierarchy โโ
WITH RECURSIVE category_tree AS (
-- Base case: root categories (no parent)
SELECT id, name, parent_id, 0 AS depth, name::text AS path
FROM categories
WHERE parent_id IS NULL
UNION ALL
-- Recursive case: child categories
SELECT c.id, c.name, c.parent_id, ct.depth + 1,
(ct.path || ' > ' || c.name)::text
FROM categories c
JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT
REPEAT(' ', depth) || name AS indented_name,
depth,
path
FROM category_tree
ORDER BY path;
Definition: Data structure optimized for fast search. B-tree indexes scalar columns, GIN indexes complex types (JSONB, arrays, full-text).
Purpose: Accelerate filters (WHERE), avoid full table scans (Seq Scan).
Why here: Without an index, PostgreSQL scans all rows (Seq Scan). With an index, it uses a tree structure to find matching rows in O(log n).
to_tsvector() tokenizes text, GIN indexes tokens, @@ searches matching tokens. Partial index: WHERE status IN ('pending', ...) reduces size by excluding completed orders (rarely filtered).-- โโ Analyze a slow query โโ
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 42 AND status = 'pending';
-- Before index: Seq Scan cost=0..1234 rows=15000
-- โโ Create a composite index โโ
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Partial index: only unprocessed orders
CREATE INDEX idx_orders_pending ON orders(user_id)
WHERE status IN ('pending', 'processing');
-- After index: Index Scan cost=0..8 rows=3
-- โโ Index for full-text search โโ
CREATE INDEX idx_products_search ON products
USING GIN(to_tsvector('english', name));
SELECT name, price
FROM products
WHERE to_tsvector('english', name) @@ to_tsquery('english', 'laptop & portable')
ORDER BY ts_rank(to_tsvector('english', name), to_tsquery('english', 'laptop & portable')) DESC;
-- โโ Partial unique index โโ
CREATE UNIQUE INDEX idx_users_active_email ON users(email)
WHERE deleted_at IS NULL; -- Email unique only for active accounts
Definition: Intentional copying of related data into the same document, instead of referencing it.
Purpose: Avoid costly JOINs (which don't exist in MongoDB), reduce queries, improve read performance.
Why here: Tradeoff: denormalization favors reads (one query = one complete document) but complicates updates (update user.name in all orders if name changes).
Definition: Series of transformation stages applied sequentially to documents.
Purpose: Filter, group, transform and sort data without client-side processing.
Why here: Each stage $match, $group, $sort produces a stream of documents feeding the next stage.
In MongoDB, data is denormalized to avoid costly joins:
// โโ Order document (complete order) โโ
{
_id: ObjectId("..."),
// User data copied at purchase time
user: {
id: "user_42",
name: "Alderi KAMTCHOUA",
email: "alderi@example.com"
},
status: "completed",
// Products included in order (no join needed)
items: [
{
product: {
id: "prod_1",
name: "MacBook Pro 16\"",
sku: "MBP-16-M3"
},
quantity: 1,
unitPrice: 2499.99
},
{
product: { id: "prod_5", name: "Magic Mouse" },
quantity: 2,
unitPrice: 79.99
}
],
totalAmount: 2659.97,
createdAt: ISODate("2025-01-15T10:30:00Z")
}
// โโ MongoDB aggregation: monthly revenue โโ
db.orders.aggregate([
{ $match: { status: "completed" } },
{
$group: {
_id: {
year: { $year: "$createdAt" },
month: { $month: "$createdAt" }
},
totalRevenue: { $sum: "$totalAmount" },
orderCount: { $sum: 1 },
avgOrderValue: { $avg: "$totalAmount" }
}
},
{ $sort: { "_id.year": -1, "_id.month": -1 } }
]);
// โโ MongoDB indexes โโ
db.orders.createIndex({ "user.id": 1, status: 1 });
db.orders.createIndex({ createdAt: -1 });
db.orders.createIndex({ "items.product.id": 1 });
Definition: Column in one table that references the primary key of another table, enforcing referential integrity.
Purpose: Guarantee that each order.user_id corresponds to an existing user.id, prevent orphaned data.
Why here: SQL enforces foreign keys at the database level. MongoDB, without mandatory schema, cannot enforce this (application code responsibility).
# PostgreSQL / SQL
โ
Structured data with complex relationships
โ
Data integrity (ACID, constraints, foreign keys)
โ
Reports and analytics with JOINs and aggregations
โ
Multi-table transactions
โ
Stable, well-defined schema
โ Horizontal scalability is complex
โ Rigid schema (migrations required)
# MongoDB / NoSQL
โ
Semi-structured data (flexible JSON documents)
โ
Native horizontal scalability (sharding)
โ
Rapid development (no migrations)
โ
Natural hierarchies and nested lists
โ
High availability with replica sets
โ No native joins โ denormalization required
โ No multi-document ACID (since v4.0 only)