Understanding Query Performance
PostgreSQL's query planner generates an execution plan for every query. Understanding these plans is the key to optimization. EXPLAIN ANALYZE is your most important diagnostic tool.
EXPLAIN ANALYZE
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 10;
Reading the Output
- Seq Scan — full table scan, no index used (often slow for large tables)
- Index Scan — uses an index to find rows
- Index Only Scan — satisfied entirely from the index (fastest)
- Nested Loop, Hash Join, Merge Join — join strategies
- actual time — real execution time (first row, total)
- rows — actual vs estimated row counts (large discrepancy = stale statistics)
Indexes
When to Add an Index
-- Index for WHERE clause columns
CREATE INDEX idx_users_created_at ON users(created_at);
-- Composite index for multi-column WHERE
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Partial index (index only a subset of rows)
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';
-- Index for LIKE queries (prefix only)
CREATE INDEX idx_users_email_text ON users USING gin(email gin_trgm_ops);
-- Requires: CREATE EXTENSION pg_trgm;
Index Types
- B-tree (default) — equality and range queries, ORDER BY
- GIN — full-text search, JSONB, arrays
- GiST — geometric data, range types
- BRIN — very large tables with naturally ordered data (timestamps)
- Hash — equality only (rarely needed)
Query Optimization Patterns
Avoid SELECT *
-- Bad: fetches all columns, may skip Index Only Scan
SELECT * FROM users WHERE email = '[email protected]';
-- Good: fetch only needed columns
SELECT id, name, email FROM users WHERE email = '[email protected]';
Use CTEs for Readability (Not Performance)
-- CTE (easier to read)
WITH recent_orders AS (
SELECT user_id, SUM(amount) as total
FROM orders
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY user_id
)
SELECT u.name, ro.total
FROM users u
JOIN recent_orders ro ON ro.user_id = u.id
WHERE ro.total > 1000;
Update Statistics
ANALYZE users; -- update stats for one table
ANALYZE; -- update all stats
VACUUM ANALYZE users; -- reclaim space and update stats
Finding Slow Queries
-- Enable pg_stat_statements
CREATE EXTENSION pg_stat_statements;
-- Find slowest queries
SELECT
query,
calls,
mean_exec_time::numeric(10,2) AS mean_ms,
total_exec_time::numeric(10,2) AS total_ms
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
Connection Pooling
PostgreSQL creates a new process per connection. For applications with many short-lived connections, use PgBouncer or pgpool-II. In production, target 50-100 max connections to PostgreSQL, with the connection pooler handling thousands of application connections.
Frequently Asked Questions
When is a sequential scan better than an index?
For small tables or queries that return more than ~5-10% of rows, a sequential scan is often faster than an index scan because it has lower overhead. PostgreSQL's planner typically makes this decision correctly.
How do I find missing indexes?
Look for Seq Scan nodes on large tables in EXPLAIN ANALYZE output. Also query pg_stat_user_tables for tables with high seq_scan counts relative to idx_scan.
aiforeverthing.com — 100+ tools, no signup required