can anyone actually explain, with hard facts, how it's possible AMD still doesn't meaningfully compete in the deep learning space? I understand they lack certain pieces of software. I do not understand how they could reasonably still lack it though.

Nov 6, 2025 · 8:01 PM UTC

53
9
3
293
Replying to @Sentdex
Software and community
1
1
no. I want more than just general words. What software? Why would this software take 5+ years since realizing nvidia would be the biggest public company of all time? The community would show up immediately if the software was there. I'd buy 10 AMD gpus right now.
2
9
Replying to @Sentdex
They don't have the alien tech... 🤷‍♂️
1
2
Are we watching the same youtubers lmaooo i think i just heard someone propose this for the first time today.
1
1
Replying to @Sentdex
They do, just not at the consumer level
2
2
I hear this, to some slight extent, but the only evidence of this is money sort of just moving around with future deals. The entire tech sector is doing this money moving thing, I am curious if any real actual processing, besides payments, has occurred lol
Replying to @Sentdex
By waiting, AMD lets NVDIA shoulder huge R&D & mature space. When standards solidify & open-source grows, AMD enters with competitive HW at lower barriers. Smart late-mover, tho software gap from underinvestment
1
8
I mean. I could sorta see that, but tbh what are we talking about here? When is AMD gonna enter and sweep? BC rn NVIDIA is running the US economy basically. What's AMD waiting for?
1
1
16
Replying to @Sentdex
To build another CUDA is actually more difficult than it seems to be. Because a large portion of code has to be rewritten for different hardware in order to keep API stable. Chris Lattner had a blog series on Mojo trying to demystify CUDA effect.
2
1
30
I do not doubt it's very difficult. But we're literally talking about being the most valuable public company here. It's not too hard for this.
2
44
Replying to @Sentdex
We've been working on it for 3 years. The software is super hard. Give it 2 more.
1
1
33
Replying to @Sentdex
Besides software, as you mention, it’s a scale-up issue; they’ve yet to release a single rack scale solution. Once they achieve that next year, we’ll see if the software factor is just too lagging or it explodes
Replying to @Sentdex
There’s too much money at play for someone to not rise up for some of that market share. It’s also possible AMD CEO has conflicts of interest (just my opinion). I’m sure thanksgiving dinners would be pretty rough if she competes with her family
Replying to @Sentdex
AMD recently hired Sharon Zhou as a VP of AI. I think she is working right now to make AMD bigger in the training stage
Open source isn’t just a philosophy, it’s a force multiplier for AI progress. @realSharonZhou, VP of AI at AMD, shares why open ecosystems are critical to unlocking the full potential of generative AI. From enabling community-driven innovation to fueling a virtuous cycle of data and model improvement, open source is shaping a more inclusive, more capable AI future. #AdvancingAI
2
Replying to @Sentdex
It’s Temu nvidia bro, always has been.
Replying to @Sentdex
I could spend hours on this topic alone.
5
Replying to @Sentdex
It goes far beyond missing software. It's an ecosystem advantage that NVIDIA has fortified over more than 15 years, fueled by its massive scale and AMD's split priorities. They are just really far behind. @__tinygrad__ are doing some interesting work with AMD cards though!
5
Replying to @Sentdex
Agree it's incomprehensible Just reproducing CUDA should not take so long and they could do it in the open to benefit from other people helping Or is the amount of engineers capable of this fast so minimal?
Replying to @Sentdex
AMD had better hardware than Nvidia around 2013. Then AMD released the horrible bulldozer processor and banked everything on Vega which took forever. Nvidia drops pascal leaving AMD in dust. Lisa Su finally takes over, has no choice but to let Vega happen, it's awful. They release ryzen and focus on CPU. They over take Intel in market share and suddenly they need to play catch up on GPUs. Had they not dropped the ball with Vega back then, they probably would've had some alternative to the h100 in 2022 and had a duopoly.
1
5
Replying to @Sentdex
Perhaps they could write their own CUDA implementation, like was done for Java. Using the API, but AMD code under the hood. Evaluating the aspects that make CUDA so successful.
Replying to @Sentdex
The MI300x has had some more support in terms of tutorials and hackathons recently.
Replying to @Sentdex
OAI recently committed to buy a vast amount of AMD hardware - I think that counts! If you mean training specifically it's about software and networking, both of which are advancing rapidly.
Replying to @Sentdex
"inference. $MSFT has built some toolkits to help convert CUDA models to $AMD's ROCm so that you could use it on an $AMD 300x, and they are getting a lot of inquiries about $AMD's path and the 400x and 450X: »We're actually working with AMD on that to see what we can do to maximize that.«
Replying to @Sentdex
Someone quoted and I agree. Prolly cuz the cost is low (good thing) and usability is also low. Took me 3 days to get my dual r9700s working properly.
Replying to @Sentdex
indeed the network effect, so nvidia comes out with cagra search, then i with a nvidia card decide to use cuvs for that then think hmm i can make even faster by fusing some of my existing cuda embedding model with cuvs so i will rewrite my own cuda for that (me and claude made lee101/gobed a cuda enabled search engine github.com/lee101/gobed ) i dont have a amd gpu, not many other deep learning researchers do and so i dont care about amd support. its only really for me and my cuda machines. This is replaying like many thousands of times with various open source researchers building ontop of other cuda libraries on their cuda machines eg all these: AI Dump of fun cuda words: cuBLASGPU-accelerated Basic Linear Algebra Subprograms (matrix/vector ops).cuSPARSESparse matrix operations (CSR/COO formats).cuSOLVERDense and sparse linear solvers (LU, QR, Cholesky, eigenvalues).cuFFTFast Fourier Transforms on GPU.cuRANDRandom number generation on GPU.cuTENSORHigh-performance tensor algebra (Einstein summation, contractions).cuDNNDeep Neural Network primitives (convolutions, RNNs, activations, etc).cuBLASLt"Lightweight" version of cuBLAS with advanced heuristics and mixed precision.cuSPARSELtSparse matrix acceleration for deep learning inference. ⚙️ Systems & Runtime Libraries LibraryPurposeCUDA RuntimeHigh-level host/device management layer (kernel launches, memory copies).CUDA Driver APILower-level control of GPU contexts and execution.NCCL (NVIDIA Collective Communication Library)Multi-GPU / multi-node collective ops (all-reduce, broadcast, etc).NVTX (NVIDIA Tools Extension)Instrumentation markers for profiling with Nsight tools.NVML (NVIDIA Management Library)GPU monitoring, thermals, utilization, power management.NVRTCRuntime compilation of CUDA kernels (JIT).NPP (NVIDIA Performance Primitives)Image, video, and signal processing primitives.ThrustC++ STL-like parallel algorithms library (map/reduce/sort/etc). 🧩 Domain-Specific CUDA SDKs DomainLibraries / SDKsAI / MLcuDNN, cuTENSOR, TensorRT, cuSPARSELt, cuBLASLtData AnalyticsRAPIDS (cuDF, cuML, cuGraph, cuSpatial), NVTabular, DALIComputer VisionNPP, VPI (Vision Programming Interface), CV-CUDA3D / Simulation / PhysicsCUDA PhysX, Flex, Omniverse Kit SDKVideo / ImagingNVENC, NVDEC, NPP, DeepStream SDKRendering / GraphicsOptiX (ray tracing), IndeX (volume visualization), RTXGINetworking / HPCNCCL, Magnum IO, GPUDirect RDMA, UCX, NVSHMEMAutonomous MachinesJetPack SDK, DriveWorks SDK, Isaac SDK 🧬 Bioinformatics / Scientific (your “BioCUDA” mention) Library / ProjectDescriptionnvBIONVIDIA bioinformatics library for DNA/RNA sequence alignment, assembly, etc.cuQuantumGPU-accelerated quantum simulation (state vector + tensor network).cuDF / cuML (RAPIDS)Can be applied in bioinformatics for data science workflows.Clara ParabricksGPU-accelerated genomics pipeline (variant calling, alignment).BioCUDA (community term)Often refers to custom CUDA kernels for genomics / molecular dynamics.AMBER / GROMACS / NAMD GPU buildsMolecular dynamics engines using CUDA backend. 🧮 High-Level Ecosystems EcosystemComponentsRAPIDScuDF (DataFrame), cuML (ML), cuGraph, cuSpatial, cuCIM (imaging).TensorRTInference optimization and deployment engine (on top of cuDNN + cuBLAS).DeepStreamVideo analytics framework for AI + IoT.OmniverseRTX-based simulation + rendering platform using CUDA + OptiX + MDL.Magnum IOSuite for multi-GPU and multi-node data transfer (NCCL, NVSHMEM, UCX, GPUDirect). 🔧 Developer / Tooling Stack ToolDescriptionNsight SystemsSystem-wide performance analysis.Nsight ComputeKernel-level profiler.Nsight GraphicsGraphics debugging/profiling.CUDA-GDBCUDA debugger.CUDA-MEMCHECKMemory error detection tool. 💡 Other Specialized Libraries LibraryAreaCUTLASSTemplate library for building custom GEMMs (used by PyTorch).cuDLADeep Learning Accelerator runtime (for Jetson).NVInfer / TensorRTNeural network inference optimization.CV-CUDAOpen-source CV primitives for cloud inference.cuQuantum / cuStateVec / cuTensorNetQuantum simulation primitives.cuPHY5G baseband physical layer acceleration.
Replying to @Sentdex
I hear cuda is a work of art?
1
1
Replying to @Sentdex
A MUST-read interview with a high-ranking $MSFT employee on data centers and what is happening right now ( $NVDA/ $AMD, liquid cooling, and HHD): 1. The challenges that $MSFT is having right now are energy and liquid cooling. To improve its goodwill with municipalities, $MSFT is setting up wastewater treatment facilities near its data centers, which also benefits the municipalities, not just $MSFT. 2. He mentions that they have been deploying a lot of $NVDA GB200s lately, but not as much as $META or X. There were some design challenges initially, but right now, there is a pretty good uptick with those with a lot of their customers. By and large, H100s are probably still their biggest pool. 3. They are seeing a slowdown in training compared to inferencing. Over the last 3-4 months, there has been increased interest in savings costs with inference. $MSFT has built some toolkits to help convert CUDA models to $AMD's ROCm so that you could use it on an $AMD 300x, and they are getting a lot of inquiries about $AMD's path and the 400x and 450X: »We're actually working with AMD on that to see what we can do to maximize that.« 4. According to him, $MSFT hasn't really brushed off OpenAI, but OpenAI is partnering with others and trying to get as much compute as they can. He is questioning how financially sustainable that becomes as OpenAI is still hemorrhaging money, but their balance sheet is actually getting better month over month. 5. He doesn't think that you can overbuild capacity at this point, as it takes time for data centers to be set up. He believes the tipping point of overbuild will be in 2029 or 2030, at least according to their projections. 6. He also gives some clarity as $MSFT is open to working with former bitcoin miners who want to transform to AI. Still, the biggest challenge with them is water and its availability, as many of them do not require liquid cooling for bitcoin mining. 7. He does mention that there is an HDD shortage because a few years ago, many HDD manufacturers cut back production to focus on SSDs and ultra SSDs. That being said, there is a ceiling that $MSFT's Azure is willing to pay for hard drives, and they are pushing back against the Seagate, Western Digital, Toshiba, and Samsungs. He believes the capacity is being added and that it will be better in the first half of 2026. found on @AlphaSenseInc