🗄️
Operational Databases
Transactional systems powering live applications
Relational Database (RDBMS)
PostgreSQL, MySQL, Oracle, SQL Server
Data stored in tables with defined schemas and enforced relationships. ACID transactions guarantee consistency. SQL provides a powerful, standardised query language. The default choice for structured, transactional data.
🏛️ Context: PostgreSQL is the default recommendation for new projects — open-source, full-featured, excellent extension ecosystem. Oracle/SQL Server persist in enterprises due to existing investment. Evaluate managed services (RDS, Cloud SQL) vs. self-hosted.
Document Database
MongoDB, Cosmos DB, Couchbase, Firestore
Stores data as flexible JSON/BSON documents — no fixed schema required. Each document can have different fields. Excellent for content management, catalogues, user profiles, and rapidly evolving data models.
🏛️ Context: Document DBs trade schema enforcement for flexibility. This is powerful for agile development but dangerous without discipline — schema validation and data contracts are essential. Avoid for highly relational data.
Key-Value Store
Redis, DynamoDB, Memcached, etcd
Simplest data model: a key maps to a value. Extremely fast lookups. Redis adds data structures (lists, sets, sorted sets, streams). DynamoDB provides serverless key-value with single-digit millisecond latency at any scale.
🏛️ Context: DynamoDB is the go-to for serverless architectures. Design around access patterns (single-table design). Redis serves as both cache and primary store for session data, leaderboards, and real-time features.
Wide-Column Store
Apache Cassandra, ScyllaDB, HBase, Bigtable
Distributed databases optimised for massive write throughput and horizontal scaling across many nodes. Data organised by rows and column families. No single point of failure. Used for time-series, IoT, and event data.
🏛️ Context: Cassandra excels at write-heavy, globally distributed workloads. Data modelling is query-driven (denormalise aggressively). ScyllaDB offers Cassandra API compatibility with better performance (C++ vs. Java).
Graph Database
Neo4j, Amazon Neptune, TigerGraph, ArangoDB
Stores data as nodes (entities) and edges (relationships). Excels at traversing complex, highly-connected data — social networks, fraud detection, recommendation engines, knowledge graphs, and network topology.
🏛️ Context: Graph DBs solve problems that are expensive in relational systems (multi-hop joins). Use when relationships are as important as the data itself. Neo4j dominates; Neptune for AWS-native. Consider graph-on-relational (Apache AGE) for lighter needs.
Vector Database
Pinecone, Weaviate, Milvus, pgvector, Qdrant
Stores high-dimensional vector embeddings and enables similarity search. Core infrastructure for AI/ML applications — semantic search, RAG (Retrieval-Augmented Generation), recommendation systems, and image search.
🏛️ Context: Vector DBs are essential for AI-powered applications. Evaluate purpose-built (Pinecone, Weaviate) vs. extensions on existing DBs (pgvector). Consider: index type (HNSW, IVF), dimensions, and update frequency.