I think AMDs lack of success comes down to two questions:
How long does it take to get the thing going?
Does the thing actually do the thing
Well, last took me @Lantos1618 and @alwaysallison , each having a lot of devops work experience, nearly 3 days to get a node of 8 mi350x (192gb each) running. And last time i checked in January it didn’t run any faster in fp8 vs bf16.
And yes it required an AMD dev pushing stuff over the weekend.
With the nvidia equivalent it would have taken only 1 of us under and hour to get it going. and it would have run 2-4x faster under 8bit or 4.
So to answer the question for AMD, the thing only does like half the thing and takes forever to get going.
I dont think this happens due to lack of talent or IQ at amd, the hardware itself is phenomenal. This will be weird for me to say but i think it mainly stems from bad management not correcting bad engineering.
If I were in charge I would just ban the use of this zip files for individual customers, and force everyone to actually use the repo and get reviews done properly in and merged before the customer is gone. It’s kinda sad how much this bad engineering practice damages the company because otherwise the hardware itself is phenomenal.
p.s. sorry it was mi300
Anyway i have a test.
Can you fit 2-3 commands in a single standard tweet, including python3 -m venv meow && pip install vllm etc to run the most popular kimi k2 model properly with int4 weights and everything. On ANY AMD provider. Does it JUST F* WORK?
Nov 8, 2025 · 1:58 PM UTC
