ψ(▼へ▼メ)~ tensor compilers and tensor cores ~(Ψ▼ー▼)∈ no royal road. in open source we trust. please show me the code.

a computer in the cloud
Joined April 2024
updates to j4orz.ai/mlsysapp/. working on the runtime and eager kernels now. picograd is taking longer than other "hobby" autograds i've seen. but our plan is to be the *definitive* resource on building your own pytorch. we agree with @karpathy that course building is a very technical process which requires the pedagogical progression to be just right throughout the entire book. to make each step not too trivial, and not too challenging. the goal is to be the llm201 course on karpathy's starfleet academy! we are early in our journey — if you are interested in helping out please come join us in the @GPU_MODE discord under the #singularity-systems work group 🖤
2
21
$MSFT CEO Satya just made one of the most revealing comments of the entire AI cycle when he said Microsoft has $NVDA GPUs sitting in racks that cannot be turned on because there is not enough energy to feed them. The real constraint is not compute but power & data center space. This is exactly why access to powered data centers has become the new leverage point. If compute is easy to buy but power is hard to get, the leverage moves to whoever controls energy & infrastructure. Every new data center that $MSFT, $GOOGL, $AMZN, $META & $ORCL are trying to build needs hundreds of megawatts of steady power. Getting that energy online now takes years which means the players who locked in power early & built vertically across the stack are the ones with real control. Hyperscaler growth is no longer defined by how many GPUs they can buy but by how quickly they can energize new capacity. Satya’s other point about not wanting to overbuy one generation of GPUs matters just as much. The refresh cycle is shortening as Nvidia releases faster chips every year which means the useful life of a GPU now depends on how quickly it can be deployed into production. When power & space are delayed then that GPU loses value before it ever produces a dollar of compute revenue. Satya just validated why my DCA plan remains overweight in the AI Utility theme. The AI economy will scale at the rate power comes online, not at the rate chips improve. The next phase of AI infrastructure growth will belong to whoever can energize capacity faster than demand expands. Power has become the pricing layer of intelligence: $IREN, $CIFR, $NBIS, $APLD, $WULF, $EOSE, $CRWV
2
broadcom aarm on the pi5 doesn't support pcie atomics which rdna userspace uses through kfd kernelspace drivers for queues. last hope for the pi before switching back to darwin or proper linux is hacking resizable bar for tinygrad's am pciiface.
1
1
we do both
1
1
enumerates after enabling pcie on pios' boot firmware config
1
1