A story about why using managed services in the Cloud is turbo nice when you're a small team
Production went down. Load balancer health checks failing. All instances marked unhealthy.
SSH'd into an instance:
- App: running
- CPU: 5%
- Memory: 40%
- Disk: 15%
- Network: fine
Manually hit health endpoint:
curl localhost/health
{"status": "ok"}
Worked perfectly.
Checked load balancer logs:
- Health check URL: /health
- Response: timeout
- Instance marked: unhealthy
The issue:
- Health endpoint responded in 100ms locally
- Load balancer timeout: 2 seconds
- Should be plenty of time
Then I noticed: Health check ran every 5 seconds.
App logged every health check. To a file. That file grew to 47 GB.
Every health check:
1. Opened 47 GB log file
2. Appended 1 line
3. Closed file
4. Took 3 seconds due to file size
5. Timed out
Fix: Disabled health check logging. Response time: back to 100ms.
Oct 25, 2025 · 6:39 PM UTC
