Breakdown of AWS outage in simple words
1. Sunday night, a DNS problem hit AWS - DynamoDB endpoint lost
2. This meant services couldn't find DynamoDB (a database that stores tons of data).
3. AWS fixed the DNS issue in about 3 hours.
4. But then EC2 (the system that creates virtual servers) broke because it needs DynamoDB to work.
5. Then the system that checks if network load balancers are healthy also failed.
6. This crashed Lambda, CloudWatch, SQS, and 75+ other services - everything that needed network connectivity.
7. This created a chain reaction - servers couldn't talk to each other, new servers couldn't start, everything got stuck
8. AWS had to intentionally slow down EC2 launches and Lambda functions to prevent total collapse.
9. Recovery took 15+ hours as they fixed each broken service while clearing massive backlogs of stuck requests.
This outage impacted: Snapchat, Roblox, Fortnite, McDonald's app, Ring doorbells, banks, and 1,000+ more websites.
This all happened in one AWS region (us-east-1).
This is why multi-region architecture isn't optional anymore.
Oct 21, 2025 · 12:10 AM UTC































