Breakdown of AWS outage in simple words 1. Sunday night, a DNS problem hit AWS - DynamoDB endpoint lost 2. This meant services couldn't find DynamoDB (a database that stores tons of data). 3. AWS fixed the DNS issue in about 3 hours. 4. But then EC2 (the system that creates virtual servers) broke because it needs DynamoDB to work. 5. Then the system that checks if network load balancers are healthy also failed. 6. This crashed Lambda, CloudWatch, SQS, and 75+ other services - everything that needed network connectivity. 7. This created a chain reaction - servers couldn't talk to each other, new servers couldn't start, everything got stuck 8. AWS had to intentionally slow down EC2 launches and Lambda functions to prevent total collapse. 9. Recovery took 15+ hours as they fixed each broken service while clearing massive backlogs of stuck requests. This outage impacted: Snapchat, Roblox, Fortnite, McDonald's app, Ring doorbells, banks, and 1,000+ more websites. This all happened in one AWS region (us-east-1). This is why multi-region architecture isn't optional anymore.

Oct 21, 2025 · 12:10 AM UTC

61
402
34
2,366
Replying to @brankopetric00
Multi region adds so much complexity. If your app is not mission critical, I’m not sure it’s worth the trouble. Multi AZ, yes of course. But it’s very rare an entire region to go down. Tradeoffs, tradeoffs.
5
1
110
Correct. 🤌
1
15
Replying to @brankopetric00
Why couldn’t they migrate resources from us-east-1, to us-west-1 during the outage? I thought high availability in the cloud was a fail safe to prevent single points of failure.
4
22
Their services depended on services in us-east-1, they had to fix the issues.
1
1
12
Replying to @brankopetric00
Look for AWS to start courting all the talent they laid off/fired/pissed off over the last five years. Automation isn't foolproof. The environment it works in gets a vote, and very few companies that work at this scale hire Chaos Monkeys.
1
1
1
13
Replying to @brankopetric00
I wonder what the total cost was if you sum up all the customers outages. Also companies uptime level would go down quite a bit.
1
1
6
Can't imagine...
1
Replying to @brankopetric00
Multi-region adds a shitload of complexity unless you reaaaallly need it. Even if you dove into the root cause, it wasn’t the lack of multi-region.
1
8
For AWS, it wasn't. For others depending on their us-east-1 region, it was. I agree that the multi-region is super complex and involves not only infrastructure teams. 👍
1
7
Replying to @brankopetric00
Curious doubt is AWS always mention their services are multi region hosted. Still wondering how one entire region down creates all services going off..
2
4
This impacted only services in us-east-1, other regions (services) worked fine
1
2
Replying to @brankopetric00
Does any one have any idea on what exactly that DNS issue is?
1
4
No specific details shared so far.
2
Replying to @brankopetric00
What dns problem?
1
1
DynamoDB endpoint was not reachable. So the internal systems couldn't resolve the hostname to IP.
1
1
8
Replying to @brankopetric00
Multi-regions wouldn't mitigate the issue unless you treat them as completely isolated zones from the front-end level, not from the backend level. The redundancy has to start with an entry API call to the database in two different pipelines and replicate between both.
1
3
30
Replying to @brankopetric00
And following yesterday's AWS us-east-1 outage... today's Google trends implies strongly: uptime is often literally only on somebody's mind when they don't have it anymore 🫣
1
4
Replying to @brankopetric00
So hilarious. Millions of dollars of fortune 500 services hinging on a cringe cheapoDB
6
Replying to @brankopetric00
That’s always been the argument for multi-region. It’s on every orgs nice to have list, it’s the management push back on cost associated to stops it from being implemented
4
Replying to @brankopetric00
(one simple word: DNS)
3
Replying to @brankopetric00
One would think that with the amount of revenue these companies make, they would have setup multi-region architecture. Were they saving on costs by hosting only on one region?
3
Replying to @brankopetric00
@brankopetric00 Spot-on breakdown of the AWS us-east-1 chaos- DNS flop snowballed to 75+ services down for 15hrs. Lesson from subsea fibre: maybe "self-healing rings" with multi-region paths & auto-failover. Turn single-point fails into seamless reroutes!
2
Replying to @brankopetric00
Microservices were a huge mistake. The Cloud is an even bigger one!
2
Replying to @brankopetric00
AWS knows since at least 3-4 years that it has a strong dependence on that region. Two years ago someone restarted a DB in Virginia region and everything crashed including their own internal tools. I'm surprised that they weren't able to de-couple.
2
Replying to @brankopetric00
Average dev will hype about multi-region and CDN for a static portfolio while big tech doesn't even bother 🫠
2
Replying to @brankopetric00
Exactly. DR and multi region failover aren’t optional once you hit production scale. You can automate deployments all day but if your critical systems (databases, APIs, auth) all live in one region, you are one DNS glitch away from downtime. True resilience means distributing workloads, syncing data across regions, and testing your failover plan before an outage hits.
1
Replying to @brankopetric00
Auto-erotically running your control plane on your data plane seems like a contributor too.
1
Replying to @brankopetric00
fake news
1
Replying to @brankopetric00
And yet Amazon stock was up
1
Replying to @brankopetric00
Even multi region won't always help: "Global services or features that rely on US-EAST-1 endpoints such as IAM updates and DynamoDB Global tables may also be experiencing issues."
1
Replying to @brankopetric00
How does a DNS “problem” hit anything? Doesn’t Azon have its own DNS? A smooth running DNS server doesn’t go sideways on its own. Something (or someone) “HIT” the DNS.
1
Replying to @brankopetric00
"This is why multi-region architecture isn't optional anymore." I'd contend it never was ...
1
Replying to @brankopetric00
So now AWS will earn more with multi AZ deployments.
1
Replying to @brankopetric00
seems like a very fragile infrastructure. billions lost
1
Come join us making PixelArt. Almost anything is allowed. Normal art greatly appreciated but we won't judge you!
3
8
3
223