Adding a few more that go deeper into real-world scaling and distributed systems side of databases:
Consensus protocols (Raft, Paxos, Zab)
Sharding strategies (range, hash, consistent hashing)
Quorum reads/writes
Conflict resolution (CRDTs, vector clocks)
Snapshot isolation & MVCC
Compaction & tombstones (esp. in LSM-based systems)
Query optimizers & cost-based planning
Columnar storage formats (Parquet, ORC)
OLTP vs OLAP design differences
Data lake vs data warehouse architecture
Hot key mitigation & load balancing
Schema evolution in distributed systems
Data locality & co-location strategies
Write amplification and read amplification
ZK / Etcd / Consul for coordination
Storage engines (InnoDB, RocksDB, WiredTiger)
Database stuff I’d study if I wanted to understand scaling deeply:
Bookmark this.
B+ Trees
LSM Trees
Write-Ahead Logging
Two-Phase Commit
Three-Phase Commit
Read Replicas
Leader-Follower Replication
Partitioning
Query Caching
Secondary Indexes
Vector Indexes (FAISS, HNSW)
Distributed Joins
Materialized Views
Event Sourcing
Change Data Capture