Database Indexing Strategies: Performance Optimization for PostgreSQL and MongoDB
Master database indexing strategies for PostgreSQL and MongoDB. Learn index types, optimization techniques, common pitfalls, and best practices to boost query performance.
Table of Contents
- Introduction
- Understanding Database Indexes
- PostgreSQL Indexing Strategies
- MongoDB Indexing Strategies
- Index Types and When to Use Them
- Performance Optimization Techniques
- Common Indexing Mistakes
- Monitoring and Maintenance
- Best Practices
- Conclusion
Introduction
Database performance is one of the most critical factors in modern application development. As your data grows, queries that once executed in milliseconds can suddenly take seconds or even minutes. The difference between a well-indexed database and a poorly indexed one can be the difference between a responsive application and one that frustrates users.
Indexing is one of the most powerful tools in a database administrator’s toolkit, yet it’s often misunderstood or underutilized. Whether you’re working with PostgreSQL’s relational model or MongoDB’s document-based approach, understanding how to create, maintain, and optimize indexes is essential for building scalable applications.
In this comprehensive guide, we’ll explore indexing strategies for both PostgreSQL and MongoDB, covering everything from basic concepts to advanced optimization techniques. You’ll learn how to identify performance bottlenecks, choose the right index types, avoid common pitfalls, and implement indexing strategies that scale with your application’s growth. By the end, you’ll have the knowledge to transform slow queries into lightning-fast operations.
Understanding Database Indexes
At its core, a database index is a data structure that improves the speed of data retrieval operations. Think of it like an index in a book—instead of scanning every page to find a topic, you can look it up in the index and jump directly to the relevant pages.
How Indexes Work
Indexes work by creating a separate data structure that stores a sorted copy of selected columns from your table (in PostgreSQL) or fields from your documents (in MongoDB). When you query the database, it can use the index to quickly locate the data without scanning the entire table or collection.
-- Without an index, PostgreSQL must scan the entire table-- This is called a "sequential scan" or "full table scan"EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';
-- Result: Seq Scan on users (cost=0.00..1000.00 rows=1 width=64)-- Filter: (email = 'user@example.com'::text)-- Execution time: 45.234 ms-- With an index on the email columnCREATE INDEX idx_users_email ON users(email);
-- The same query now uses the indexEXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';
-- Result: Index Scan using idx_users_email on users (cost=0.42..8.44 rows=1 width=64)-- Index Cond: (email = 'user@example.com'::text)-- Execution time: 0.123 msThe Trade-off: Storage vs Speed
⚠️ Important: Indexes come with trade-offs. While they dramatically improve read performance, they:
- Consume storage space: Each index requires additional disk space
- Slow down writes: Every INSERT, UPDATE, or DELETE must update the index
- Require maintenance: Indexes need to be maintained as data changes
The key is finding the right balance—creating indexes for frequently queried columns while avoiding over-indexing that slows down write operations.
When Indexes Are Most Effective
Indexes provide the most benefit when:
✅ High selectivity: The indexed column has many unique values ✅ Frequent queries: The column is used often in WHERE clauses, JOINs, or ORDER BY ✅ Large tables: Tables with thousands or millions of rows benefit most ✅ Read-heavy workloads: Applications with more reads than writes
❌ Low selectivity: Columns with few unique values (like boolean flags) rarely benefit ❌ Write-heavy workloads: Too many indexes can slow down INSERT/UPDATE operations ❌ Small tables: Tables with few rows may not benefit from indexes
PostgreSQL Indexing Strategies
PostgreSQL offers a rich set of indexing options, each optimized for different use cases. Understanding when to use each type is crucial for optimal performance.
B-Tree Indexes: The Default Choice
B-Tree (Balanced Tree) indexes are PostgreSQL’s default index type and work well for most scenarios. They’re efficient for equality and range queries.
-- Create a B-Tree index (default, so you can omit USING btree)CREATE INDEX idx_users_created_at ON users(created_at);
-- Efficient for range queriesSELECT * FROM usersWHERE created_at BETWEEN '2024-01-01' AND '2024-12-31'ORDER BY created_at;
-- Efficient for equality queriesSELECT * FROM users WHERE email = 'user@example.com';Composite Indexes: Multi-Column Optimization
Composite indexes include multiple columns and are powerful for queries that filter or sort by multiple columns.
-- Create a composite indexCREATE INDEX idx_orders_user_status ON orders(user_id, status, created_at);
-- This query can use the index efficientlySELECT * FROM ordersWHERE user_id = 123 AND status = 'pending'ORDER BY created_at DESC;
-- ✅ Efficient: Uses the index (leftmost prefix)SELECT * FROM orders WHERE user_id = 123;
-- ✅ Efficient: Uses the index (leftmost prefix)SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';
-- ❌ Less efficient: Can't use index effectively (status is not leftmost)SELECT * FROM orders WHERE status = 'pending';💡 Tip: The order of columns in a composite index matters. Place the most selective column first, and consider the order in which columns appear in your WHERE clauses.
Partial Indexes: Indexing Subsets
Partial indexes only index rows that meet a specific condition, reducing index size and improving performance for filtered queries.
-- Index only active users (assumes most users are inactive)CREATE INDEX idx_users_active_email ON users(email)WHERE is_active = true;
-- This query uses the partial indexSELECT * FROM users WHERE email = 'user@example.com' AND is_active = true;
-- This query cannot use the partial indexSELECT * FROM users WHERE email = 'user@example.com' AND is_active = false;Expression Indexes: Indexing Computed Values
Expression indexes allow you to index the result of a function or expression, useful for case-insensitive searches or computed columns.
-- Case-insensitive email searchCREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Efficient case-insensitive querySELECT * FROM users WHERE LOWER(email) = 'user@example.com';
-- Index on computed columnCREATE INDEX idx_orders_total ON orders((quantity * unit_price));SELECT * FROM orders WHERE (quantity * unit_price) > 1000;Hash Indexes: Fast Equality Lookups
Hash indexes are optimized for equality comparisons but don’t support range queries or sorting.
-- Create a hash indexCREATE INDEX idx_users_email_hash ON users USING HASH(email);
-- ✅ Efficient: Equality comparisonSELECT * FROM users WHERE email = 'user@example.com';
-- ❌ Cannot use hash index: Range querySELECT * FROM users WHERE email > 'user@example.com';⚠️ Note: Hash indexes don’t support range queries or ORDER BY operations. Use B-Tree indexes if you need these features.
GIN and GiST Indexes: Full-Text and Complex Data
GIN (Generalized Inverted Index) and GiST (Generalized Search Tree) indexes are specialized for full-text search, JSONB data, arrays, and geometric data.
-- GIN index for full-text searchCREATE INDEX idx_posts_content_gin ON posts USING GIN(to_tsvector('english', content));
-- Efficient full-text searchSELECT * FROM postsWHERE to_tsvector('english', content) @@ to_tsquery('english', 'database & performance');
-- GIN index for JSONBCREATE INDEX idx_users_metadata_gin ON users USING GIN(metadata);
-- Efficient JSONB queriesSELECT * FROM users WHERE metadata @> '{"role": "admin"}'::jsonb;Covering Indexes: Include Non-Key Columns
Covering indexes (using the INCLUDE clause) store additional columns in the index leaf pages, allowing some queries to be answered entirely from the index without accessing the table.
-- Covering index that includes frequently accessed columnsCREATE INDEX idx_orders_user_covering ON orders(user_id, status)INCLUDE (total_amount, created_at);
-- This query can be answered entirely from the indexSELECT user_id, status, total_amount, created_atFROM ordersWHERE user_id = 123 AND status = 'completed';MongoDB Indexing Strategies
MongoDB’s document-based model requires different indexing strategies, but the core principles remain the same. Understanding MongoDB’s index types and how they interact with queries is essential for optimal performance.
Single Field Indexes: Basic Performance Boost
Single field indexes are the simplest type and work similarly to PostgreSQL’s single-column indexes.
// Create a single field indexdb.users.createIndex({ email: 1 });
// Efficient query using the indexdb.users.find({ email: "user@example.com" });
// Index direction matters for sortingdb.users.createIndex({ created_at: -1 }); // Descendingdb.users.find({}).sort({ created_at: -1 }); // Uses index efficientlyCompound Indexes: Multi-Field Queries
Compound indexes in MongoDB work similarly to PostgreSQL’s composite indexes, with the same leftmost prefix rule.
// Create a compound indexdb.orders.createIndex({ user_id: 1, status: 1, created_at: -1 });
// ✅ Efficient: Uses leftmost prefixdb.orders.find({ user_id: 123 });
// ✅ Efficient: Uses leftmost prefixdb.orders.find({ user_id: 123, status: "pending" }).sort({ created_at: -1 });
// ❌ Less efficient: Can't use index effectivelydb.orders.find({ status: "pending" });Multikey Indexes: Indexing Arrays
MongoDB automatically creates multikey indexes when you index an array field, creating an index entry for each array element.
// Create an index on an array fielddb.products.createIndex({ tags: 1 });
// Efficient query on array elementsdb.products.find({ tags: "electronics" });
// Multikey index on nested arraysdb.users.createIndex({ "addresses.city": 1 });db.users.find({ "addresses.city": "New York" });Text Indexes: Full-Text Search
MongoDB’s text indexes provide full-text search capabilities across string content.
// Create a text indexdb.articles.createIndex({ title: "text", content: "text" });
// Full-text search querydb.articles.find({ $text: { $search: "database performance" } });
// Text search with relevance scoringdb.articles .find( { $text: { $search: "database performance" } }, { score: { $meta: "textScore" } }, ) .sort({ score: { $meta: "textScore" } });Geospatial Indexes: Location-Based Queries
MongoDB supports 2dsphere indexes for geospatial queries, essential for location-based applications.
// Create a 2dsphere indexdb.places.createIndex({ location: "2dsphere" });
// Find places near a locationdb.places.find({ location: { $near: { $geometry: { type: "Point", coordinates: [-73.97, 40.77], // Longitude, Latitude }, $maxDistance: 1000, // meters }, },});Partial Indexes: Conditional Indexing
MongoDB supports partial indexes with a filter expression, similar to PostgreSQL’s partial indexes.
// Index only active usersdb.users.createIndex( { email: 1 }, { partialFilterExpression: { is_active: true } },);
// Efficient query on active usersdb.users.find({ email: "user@example.com", is_active: true });
// Cannot use partial indexdb.users.find({ email: "user@example.com", is_active: false });TTL Indexes: Automatic Expiration
TTL (Time To Live) indexes automatically remove documents after a specified time period, useful for session data, logs, or temporary documents.
// Create a TTL index that expires documents after 24 hoursdb.sessions.createIndex({ created_at: 1 }, { expireAfterSeconds: 86400 });
// Documents with created_at older than 24 hours are automatically deletedSparse Indexes: Indexing Only Existing Fields
Sparse indexes only include documents that have the indexed field, useful for optional fields.
// Create a sparse indexdb.users.createIndex({ phone_number: 1 }, { sparse: true });
// Efficient query (only searches indexed documents)db.users.find({ phone_number: "+1234567890" });Index Types and When to Use Them
Choosing the right index type depends on your query patterns, data characteristics, and performance requirements. Here’s a comprehensive guide to help you decide.
PostgreSQL Index Type Selection
| Index Type | Best For | Limitations |
|---|---|---|
| B-Tree | Most common use cases, equality and range queries, sorting | Default choice, works for 90% of scenarios |
| Hash | Fast equality lookups only | No range queries, no ORDER BY, not WAL-logged |
| GIN | Full-text search, JSONB, arrays, multiple values | Larger than B-Tree, slower updates |
| GiST | Full-text search, geometric data, custom data types | Slower than GIN for full-text, more flexible |
| BRIN | Very large tables with natural ordering | Less flexible than B-Tree, requires sorted data |
| SP-GiST | Specialized data structures (trees, tries) | Niche use cases |
MongoDB Index Type Selection
| Index Type | Best For | Limitations |
|---|---|---|
| Single Field | Simple queries on one field | Basic use case |
| Compound | Multi-field queries and sorting | Column order matters (leftmost prefix) |
| Multikey | Array fields, nested documents | One multikey index per document |
| Text | Full-text search across strings | One text index per collection |
| 2dsphere | Geospatial queries | Requires GeoJSON format |
| TTL | Automatic document expiration | Only works with date fields |
| Sparse | Optional fields | Skips documents without the field |
| Partial | Conditional indexing | Requires filter expression |
Decision Matrix
Use this decision matrix to choose the right index:
-- PostgreSQL: Decision flow-- 1. Is it a simple equality/range query? → B-TreeCREATE INDEX idx_users_email ON users(email);
-- 2. Is it full-text search? → GINCREATE INDEX idx_posts_content_gin ON posts USING GIN(to_tsvector('english', content));
-- 3. Is it JSONB queries? → GINCREATE INDEX idx_users_metadata_gin ON users USING GIN(metadata);
-- 4. Is it a very large table with sorted data? → BRINCREATE INDEX idx_logs_timestamp_brin ON logs USING BRIN(timestamp);// MongoDB: Decision flow// 1. Simple field query? → Single Field Indexdb.users.createIndex({ email: 1 });
// 2. Multiple fields? → Compound Indexdb.orders.createIndex({ user_id: 1, status: 1 });
// 3. Array field? → Multikey Index (automatic)db.products.createIndex({ tags: 1 });
// 4. Full-text search? → Text Indexdb.articles.createIndex({ title: "text", content: "text" });
// 5. Geospatial? → 2dsphere Indexdb.places.createIndex({ location: "2dsphere" });Performance Optimization Techniques
Creating indexes is just the beginning. Optimizing them for your specific workload requires understanding query patterns, analyzing execution plans, and fine-tuning based on real-world performance data.
Analyzing Query Performance
Both PostgreSQL and MongoDB provide tools to analyze query performance and understand how indexes are being used.
PostgreSQL: EXPLAIN and EXPLAIN ANALYZE
-- EXPLAIN shows the query plan without executingEXPLAIN SELECT * FROM users WHERE email = 'user@example.com';
-- EXPLAIN ANALYZE executes the query and shows actual performanceEXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';
-- Understanding the output:-- Seq Scan = Full table scan (bad for large tables)-- Index Scan = Using an index (good)-- Index Only Scan = Answer from index alone (best)-- Bitmap Index Scan = Multiple index scans combined-- Enable detailed query statisticsSET enable_seqscan = off; -- Force index usage (for testing only)EXPLAIN (ANALYZE, BUFFERS, VERBOSE)SELECT * FROM ordersWHERE user_id = 123 AND status = 'pending';MongoDB: explain() and Execution Stats
// Get query execution plandb.orders.find({ user_id: 123, status: "pending" }).explain("executionStats");
// Key metrics to watch:// - executionStats.executionTimeMillis: Total execution time// - executionStats.totalDocsExamined: Documents scanned// - executionStats.totalKeysExamined: Index keys examined// - executionStats.stage: "IXSCAN" = using index (good)// "COLLSCAN" = collection scan (bad)// Analyze index usagedb.orders.aggregate([ { $match: { user_id: 123, status: "pending" } }, { $explain: true },]);Index Selectivity and Cardinality
Selectivity measures how unique the values in an indexed column are. Higher selectivity means better index performance.
-- PostgreSQL: Check index selectivitySELECT schemaname, tablename, indexname, idx_scan as index_scans, idx_tup_read as tuples_read, idx_tup_fetch as tuples_fetchedFROM pg_stat_user_indexesWHERE schemaname = 'public'ORDER BY idx_scan DESC;
-- Calculate selectivitySELECT COUNT(DISTINCT status)::float / COUNT(*) as selectivity, COUNT(*) as total_rowsFROM orders;-- Selectivity close to 1.0 = high selectivity (good)-- Selectivity close to 0.0 = low selectivity (bad)// MongoDB: Check index usage statisticsdb.orders.getIndexes();
// Analyze index effectivenessdb.orders.aggregate([ { $group: { _id: "$status", count: { $sum: 1 } } }, { $sort: { count: -1 } },]);Index Maintenance and Rebuilding
Over time, indexes can become fragmented or bloated, especially in write-heavy workloads. Regular maintenance is essential.
PostgreSQL: REINDEX and VACUUM
-- Rebuild a specific indexREINDEX INDEX CONCURRENTLY idx_users_email;
-- Rebuild all indexes on a tableREINDEX TABLE CONCURRENTLY users;
-- Analyze table statistics (helps query planner)ANALYZE users;
-- Vacuum to reclaim space and update statisticsVACUUM ANALYZE users;
-- Check index bloatSELECT schemaname, tablename, indexname, pg_size_pretty(pg_relation_size(indexrelid)) as index_size, idx_scan as index_scansFROM pg_stat_user_indexesWHERE schemaname = 'public'ORDER BY pg_relation_size(indexrelid) DESC;MongoDB: Rebuild Indexes
// Rebuild all indexes on a collectiondb.orders.reIndex();
// Drop and recreate an indexdb.orders.dropIndex("user_id_1_status_1");db.orders.createIndex({ user_id: 1, status: 1 });
// Check index sizesdb.orders.stats().indexSizes;Query Optimization Strategies
Optimizing queries goes beyond just creating indexes. Understanding how to write queries that leverage indexes effectively is crucial.
PostgreSQL: Query Optimization
-- ✅ Good: Uses index on emailSELECT * FROM users WHERE email = 'user@example.com';
-- ❌ Bad: Function on indexed column prevents index usageSELECT * FROM users WHERE LOWER(email) = 'user@example.com';-- Solution: Create expression indexCREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- ✅ Good: Uses composite index efficientlySELECT * FROM ordersWHERE user_id = 123 AND status = 'pending'ORDER BY created_at DESC;
-- ❌ Bad: OR conditions can prevent index usageSELECT * FROM ordersWHERE user_id = 123 OR status = 'pending';-- Solution: Use UNION or separate queries
-- ✅ Good: LIMIT with ORDER BY uses indexSELECT * FROM users ORDER BY created_at DESC LIMIT 10;
-- ❌ Bad: OFFSET without LIMIT scans many rowsSELECT * FROM users ORDER BY created_at DESC OFFSET 10000;-- Solution: Use cursor-based paginationMongoDB: Query Optimization
// ✅ Good: Uses index efficientlydb.orders .find({ user_id: 123, status: "pending" }) .sort({ created_at: -1 }) .limit(10);
// ❌ Bad: $or can prevent index usagedb.orders.find({ $or: [{ user_id: 123 }, { status: "pending" }],});// Solution: Use $in or separate queries
// ✅ Good: Compound index supports query and sortdb.orders.find({ user_id: 123 }).sort({ created_at: -1 });
// ❌ Bad: Sort on non-indexed fielddb.orders.find({ user_id: 123 }).sort({ total_amount: -1 });// Solution: Add total_amount to compound index
// ✅ Good: Covered query (all fields in index)db.orders.find({ user_id: 123 }, { user_id: 1, status: 1, created_at: 1 });
// ❌ Bad: Query requires table accessdb.orders.find( { user_id: 123 }, { user_id: 1, status: 1, total_amount: 1 }, // total_amount not in index);Index-Only Scans and Covered Queries
The fastest queries are those that can be answered entirely from the index without accessing the table data.
PostgreSQL: Index-Only Scans
-- Create covering indexCREATE INDEX idx_orders_covering ON orders(user_id, status)INCLUDE (total_amount, created_at);
-- Query that uses index-only scanEXPLAIN ANALYZESELECT user_id, status, total_amount, created_atFROM ordersWHERE user_id = 123 AND status = 'pending';
-- Result: Index Only Scan using idx_orders_covering-- Heap Fetches: 0 (no table access needed!)MongoDB: Covered Queries
// Create compound indexdb.orders.createIndex({ user_id: 1, status: 1, created_at: -1 });
// Covered query (all fields in index)db.orders.find( { user_id: 123, status: "pending" }, { user_id: 1, status: 1, created_at: 1, _id: 0 },);
// Check if query is covereddb.orders .find( { user_id: 123, status: "pending" }, { user_id: 1, status: 1, created_at: 1, _id: 0 }, ) .explain("executionStats");// executionStats.stage should be "IXSCAN"// executionStats.executionStats.totalDocsExamined should be 0Common Indexing Mistakes
Even experienced developers make indexing mistakes. Understanding common pitfalls helps you avoid them and optimize your database performance from the start.
Over-Indexing: Too Much of a Good Thing
Creating too many indexes can slow down write operations and consume excessive storage.
-- ❌ Bad: Indexing every columnCREATE INDEX idx_users_id ON users(id);CREATE INDEX idx_users_email ON users(email);CREATE INDEX idx_users_name ON users(name);CREATE INDEX idx_users_created_at ON users(created_at);CREATE INDEX idx_users_updated_at ON users(updated_at);CREATE INDEX idx_users_status ON users(status);CREATE INDEX idx_users_role ON users(role);-- ... and so on
-- ✅ Good: Index only frequently queried columnsCREATE INDEX idx_users_email ON users(email); -- Used in WHERE clausesCREATE INDEX idx_users_created_at ON users(created_at); -- Used for sorting-- Skip indexes on rarely queried columns💡 Tip: Monitor index usage and remove unused indexes. A good rule of thumb is to have no more than 5-7 indexes per table, unless you have a specific read-heavy workload.
Under-Indexing: Missing Critical Indexes
Failing to index frequently queried columns results in full table scans and poor performance.
-- ❌ Bad: No index on frequently queried columnSELECT * FROM orders WHERE user_id = 123; -- Full table scan!
-- ✅ Good: Create index on frequently queried columnCREATE INDEX idx_orders_user_id ON orders(user_id);SELECT * FROM orders WHERE user_id = 123; -- Index scan!Wrong Column Order in Composite Indexes
The order of columns in composite indexes matters due to the leftmost prefix rule.
-- ❌ Bad: Wrong column orderCREATE INDEX idx_orders_status_user ON orders(status, user_id);-- Query: WHERE user_id = 123 AND status = 'pending'-- Cannot use index efficiently (user_id is not leftmost)
-- ✅ Good: Correct column orderCREATE INDEX idx_orders_user_status ON orders(user_id, status);-- Query: WHERE user_id = 123 AND status = 'pending'-- Uses index efficientlyIndexing Low-Selectivity Columns
Indexing columns with few unique values (like boolean flags) rarely improves performance.
-- ❌ Bad: Indexing low-selectivity columnCREATE INDEX idx_users_is_active ON users(is_active);-- Only 2 unique values (true/false), index provides little benefit
-- ✅ Good: Use partial index insteadCREATE INDEX idx_users_active_email ON users(email)WHERE is_active = true;-- Only indexes active users, much more selectiveIgnoring Query Patterns
Creating indexes without understanding actual query patterns leads to ineffective indexes.
-- ❌ Bad: Index doesn't match query patternCREATE INDEX idx_orders_user ON orders(user_id);-- But queries always filter by user_id AND statusSELECT * FROM orders WHERE user_id = 123 AND status = 'pending';-- Index helps but not optimal
-- ✅ Good: Index matches query patternCREATE INDEX idx_orders_user_status ON orders(user_id, status);-- Matches the actual query patternNot Considering Sort Operations
Indexes can support both filtering and sorting, but the column order must match the query.
-- ❌ Bad: Index doesn't support sortCREATE INDEX idx_orders_user_status ON orders(user_id, status);-- Query: WHERE user_id = 123 ORDER BY created_at DESC-- Cannot use index for sorting
-- ✅ Good: Include sort column in indexCREATE INDEX idx_orders_user_status_created ON orders(user_id, status, created_at DESC);-- Supports both filtering and sortingFunction-Based Queries Without Expression Indexes
Using functions on indexed columns prevents index usage unless you create expression indexes.
-- ❌ Bad: Function prevents index usageCREATE INDEX idx_users_email ON users(email);SELECT * FROM users WHERE LOWER(email) = 'user@example.com';-- Cannot use index (function on indexed column)
-- ✅ Good: Create expression indexCREATE INDEX idx_users_email_lower ON users(LOWER(email));SELECT * FROM users WHERE LOWER(email) = 'user@example.com';-- Uses expression indexMongoDB-Specific Mistakes
// ❌ Bad: Index direction doesn't match sortdb.orders.createIndex({ created_at: 1 }); // Ascendingdb.orders.find({}).sort({ created_at: -1 }); // Descending sort// Index can't be used efficiently for sorting
// ✅ Good: Match index direction to sortdb.orders.createIndex({ created_at: -1 }); // Descendingdb.orders.find({}).sort({ created_at: -1 }); // Uses index efficiently
// ❌ Bad: Multiple multikey indexesdb.products.createIndex({ tags: 1 });db.products.createIndex({ categories: 1 }); // Both are arrays// MongoDB can only use one multikey index per query
// ✅ Good: Compound index with one array fielddb.products.createIndex({ tags: 1, price: 1 });Monitoring and Maintenance
Regular monitoring and maintenance ensure your indexes continue to perform well as your data grows and query patterns evolve.
PostgreSQL Monitoring
-- Check index usage statisticsSELECT schemaname, tablename, indexname, idx_scan as index_scans, idx_tup_read as tuples_read, idx_tup_fetch as tuples_fetched, pg_size_pretty(pg_relation_size(indexrelid)) as index_sizeFROM pg_stat_user_indexesWHERE schemaname = 'public'ORDER BY idx_scan ASC; -- Find unused indexes (low scan count)
-- Find unused indexesSELECT schemaname, tablename, indexname, idx_scanFROM pg_stat_user_indexesWHERE idx_scan = 0 AND schemaname = 'public' AND indexname NOT LIKE 'pg_toast%';
-- Check index bloatSELECT schemaname || '.' || tablename AS table_name, indexname, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size, idx_scan AS index_scans, CASE WHEN idx_scan = 0 THEN 'UNUSED' WHEN idx_scan < 100 THEN 'LOW USAGE' ELSE 'ACTIVE' END AS usage_statusFROM pg_stat_user_indexesWHERE schemaname = 'public'ORDER BY pg_relation_size(indexrelid) DESC;
-- Monitor slow queriesSELECT query, calls, total_exec_time, mean_exec_time, max_exec_timeFROM pg_stat_statementsWHERE mean_exec_time > 1000 -- Queries taking more than 1 secondORDER BY mean_exec_time DESCLIMIT 10;MongoDB Monitoring
// Check index usagedb.orders.aggregate([{ $indexStats: {} }]);
// Find unused indexesdb.orders.aggregate([ { $indexStats: {} }, { $match: { "accesses.ops": { $lt: 100 } } }, // Less than 100 operations]);
// Monitor slow queriesdb.setProfilingLevel(1, { slowms: 100 }); // Log queries slower than 100msdb.system.profile.find().sort({ ts: -1 }).limit(10);
// Check index sizesdb.orders.stats();
// Analyze collection statisticsdb.orders.stats({ indexDetails: true });Automated Maintenance Scripts
Create maintenance scripts to keep your indexes optimized:
-- PostgreSQL: Automated index maintenance scriptDO $$DECLARE r RECORD;BEGIN FOR r IN SELECT indexrelid::regclass AS index_name FROM pg_stat_user_indexes WHERE schemaname = 'public' AND idx_scan = 0 AND indexrelid::regclass::text NOT LIKE 'pg_toast%' LOOP RAISE NOTICE 'Dropping unused index: %', r.index_name; -- Uncomment to actually drop: -- EXECUTE format('DROP INDEX CONCURRENTLY %I', r.index_name); END LOOP;END $$;
-- Rebuild fragmented indexesDO $$DECLARE r RECORD;BEGIN FOR r IN SELECT indexrelid::regclass AS index_name FROM pg_stat_user_indexes WHERE schemaname = 'public' AND idx_scan > 1000 -- Only rebuild frequently used indexes LOOP RAISE NOTICE 'Rebuilding index: %', r.index_name; -- Uncomment to actually rebuild: -- EXECUTE format('REINDEX INDEX CONCURRENTLY %I', r.index_name); END LOOP;END $$;// MongoDB: Automated index maintenance scriptfunction maintainIndexes() { const collections = db.getCollectionNames();
collections.forEach((collectionName) => { const collection = db.getCollection(collectionName); const indexes = collection.getIndexes();
indexes.forEach((index) => { const stats = collection .aggregate([{ $indexStats: {} }, { $match: { name: index.name } }]) .toArray();
if (stats.length > 0 && stats[0].accesses.ops < 10) { print(`Unused index found: ${collectionName}.${index.name}`); // Uncomment to drop: // collection.dropIndex(index.name); } }); });}
maintainIndexes();Performance Baselines and Alerts
Establish performance baselines and set up alerts for degradation:
-- PostgreSQL: Create performance baseline viewCREATE OR REPLACE VIEW index_performance_baseline ASSELECT schemaname, tablename, indexname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) as index_size, NOW() as baseline_dateFROM pg_stat_user_indexesWHERE schemaname = 'public';
-- Compare current performance to baselineSELECT current.indexname, baseline.idx_scan as baseline_scans, current.idx_scan as current_scans, current.idx_scan - baseline.idx_scan as scan_differenceFROM pg_stat_user_indexes currentJOIN index_performance_baseline baseline ON current.indexname = baseline.indexnameWHERE current.idx_scan < baseline.idx_scan * 0.5; -- 50% decrease in usageBest Practices
Following these best practices will help you create and maintain efficient indexes that scale with your application.
Index Creation Best Practices
✅ Start with query analysis: Analyze your most common queries before creating indexes ✅ Index foreign keys: Foreign keys are frequently used in JOINs and WHERE clauses ✅ Index columns in WHERE clauses: Prioritize columns used in WHERE conditions ✅ Consider sort operations: Include ORDER BY columns in indexes when possible ✅ Use composite indexes wisely: Create composite indexes for multi-column queries ✅ Monitor index usage: Regularly review which indexes are actually being used ✅ Test with production-like data: Index performance varies with data volume and distribution
Index Maintenance Best Practices
✅ Regular VACUUM/ANALYZE: Keep PostgreSQL statistics up to date ✅ Rebuild fragmented indexes: Periodically rebuild indexes in write-heavy workloads ✅ Remove unused indexes: Drop indexes that aren’t being used ✅ Monitor index bloat: Watch for indexes that consume excessive space ✅ Update statistics: Keep query planner statistics current for optimal plans
Query Writing Best Practices
✅ Write index-friendly queries: Avoid functions on indexed columns when possible ✅ Use EXPLAIN/explain(): Always analyze query plans before deploying ✅ Leverage covering indexes: Design queries to use index-only scans when possible ✅ Avoid SELECT *: Only select columns you need (helps with covering indexes) ✅ Use LIMIT: Limit result sets to reduce data transfer and improve performance
PostgreSQL-Specific Best Practices
-- ✅ Use CONCURRENTLY for production indexesCREATE INDEX CONCURRENTLY idx_users_email ON users(email);-- Prevents table locking during index creation
-- ✅ Use partial indexes for filtered dataCREATE INDEX idx_users_active_email ON users(email)WHERE is_active = true;
-- ✅ Use expression indexes for function-based queriesCREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- ✅ Include columns in covering indexesCREATE INDEX idx_orders_covering ON orders(user_id, status)INCLUDE (total_amount, created_at);MongoDB-Specific Best Practices
// ✅ Use background index creation for large collectionsdb.users.createIndex({ email: 1 }, { background: true });
// ✅ Create compound indexes that support multiple query patternsdb.orders.createIndex({ user_id: 1, status: 1, created_at: -1 });// Supports: user_id, user_id + status, user_id + status + sort
// ✅ Use sparse indexes for optional fieldsdb.users.createIndex({ phone_number: 1 }, { sparse: true });
// ✅ Consider index intersection (MongoDB can use multiple indexes)// But prefer compound indexes when possiblePerformance Testing Best Practices
✅ Test with realistic data volumes: Index performance changes with data size ✅ Test write performance: Indexes impact INSERT/UPDATE/DELETE operations ✅ Monitor query execution times: Track performance metrics over time ✅ Load test: Test indexes under production-like load conditions ✅ A/B test index changes: Compare performance before and after index changes
Documentation Best Practices
✅ Document index purpose: Note why each index exists and what queries it supports ✅ Document maintenance schedule: Record when indexes were last rebuilt or analyzed ✅ Track index changes: Version control index creation scripts ✅ Monitor index metrics: Keep records of index usage and performance over time
Conclusion
Database indexing is both an art and a science. While the fundamental concepts are straightforward, mastering indexing strategies requires understanding your specific data patterns, query workloads, and performance requirements. Whether you’re working with PostgreSQL’s relational model or MongoDB’s document-based approach, the principles remain the same: create indexes that match your query patterns, monitor their effectiveness, and maintain them regularly.
The key takeaways from this guide:
- Start with query analysis: Understand your query patterns before creating indexes
- Choose the right index type: B-Tree for most cases, specialized indexes for specific needs
- Optimize composite indexes: Column order matters due to the leftmost prefix rule
- Monitor and maintain: Regularly review index usage and rebuild when necessary
- Avoid common mistakes: Don’t over-index, don’t ignore query patterns, and don’t forget about sort operations
Remember that indexing is an iterative process. As your application evolves and query patterns change, your indexing strategy should evolve too. Regular monitoring and maintenance ensure your indexes continue to provide optimal performance as your data grows.
For more in-depth information, refer to the PostgreSQL documentation on indexes and the MongoDB indexing documentation. If you’re still deciding between databases, check out our comparison guides on PostgreSQL vs MySQL vs SQLite and MongoDB vs PostgreSQL.
By applying these indexing strategies and best practices, you’ll be well-equipped to build fast, scalable database-backed applications that perform well under load and grow with your user base.