Caching is an good way to boost the performance of a web application.
But it's not as simple as turning a switch.
There are several backend read and write caching strategies.
Here is how to pick the right one:
1. Cache aside.
The application reads the data directly from the cache. If the data is in the cache, it's returned to the client. Otherwise, the application retrieves the data from the database and then writes it to the cache
Pros & Cons:
- database and cache can handle different data models
- lazy loading prevents the cache from being loaded with unnecessary data
- return stale data without a proper expiration time
- many initial cache miss
2. Read through.
Like read aside, but the application interacts only with the cache. In case of a miss, the cache gets the data from the database, stores it, and return it to the application
Pros & Cons:
- simplify application code
- require a plugin for getting the data
3. Write through.
The application writes data to the cache. The cache write immediately the data to the database.
Pros & Cons:
- data consistency between cache and database
- no data loss in case of cache crash
- higher latency for write operations
4. Write back.
The application writes data to the cache. The cache writes the data to the database asynchronously
Pros & Cons:
- low latency
- less load on the database
- tolerant to database failures
- data can be lost if the cache crash
5. Write around.
The application writes data directly to the database. Only the read data get into the cache.
Pros & Cons:
- adapt for data written once and read less frequently (the cache store only re-reads data)
- reading recently written data gives a miss (high latency)
A perfect caching strategy doesn't exist. It all depends on the data and data access patterns.
An excellent general-purpose strategy for read-intensive applications is the cache aside. Memcached and Redis are widely used caches to implement this strategy.
A write-back strategy is a good choice if you have a write-heavy workload. Write-around could be a good fit if you have data written once and rarely read (i.e., real-time logs.
You can read the full article about caching in distributed systems (second part) in the latest issue of
@EngPolymathic