ψ(▼へ▼メ)~ tensor compilers and tensor cores ~(Ψ▼ー▼)∈ no royal road. in open source we trust. please show me the code.

a computer in the cloud
Joined April 2024
Pinned Tweet
good article from triton contributor and prev gpu compiler lead at google, claiming same as @__tinygrad__ you can't tape out without having a framework (the ir) because ml deals with incomplete irs. i.e. tpus only succeed because of jax and xla.
1
6
1
62
j4orz retweeted
[ENG SUB] how it feels to use eager pytorch in 2025
j4orz retweeted
Hi @JeffDean, what’s the plan for releasing the code for this line of work? None of these papers so far seem to have released any code
An exciting new approach for doing continual learning, using nested optimization for enhancing long context processing.
j4orz retweeted
Replying to @RolandForTexas
You are a taker, not a maker. All you’ve done your whole life is take from the makers of the world. The zero-sum mindset you have is at the root of so much evil. Once you realize that civilization is not zero-sum and that it is about making far more than one consumes, then it becomes obvious that the path to prosperity for all is just let the makers make. Regarding Tesla, the reality is that I have been given nothing. However, if I lead Tesla to become the most valuable company in the world by far and it stays that way for 5 years, shareholders voted to award me 12% of what is built. Anyone who wants to come along for the ride can buy Tesla stock. If Tesla “merely” becomes a $1.999 trillion dollar company, I get nothing. This is a great deal for shareholders, which is why they voted so overwhelmingly to approve this, for which I am immensely grateful. And they did so by a margin far more than you won your political seat.
updates to j4orz.ai/mlsysapp/. working on the runtime and eager kernels now. picograd is taking longer than other "hobby" autograds i've seen. but our plan is to be the *definitive* resource on building your own pytorch. we agree with @karpathy that course building is a very technical process which requires the pedagogical progression to be just right throughout the entire book. to make each step not too trivial, and not too challenging. the goal is to be the llm201 course on karpathy's starfleet academy! we are early in our journey — if you are interested in helping out please come join us in the @GPU_MODE discord under the #singularity-systems work group 🖤
2
21
j4orz retweeted
Technological innovation can be a form of participation in the divine act of creation. It carries an ethical and spiritual weight, for every design choice expresses a vision of humanity. The Church therefore calls all builders of #AI to cultivate moral discernment as a fundamental part of their work—to develop systems that reflect justice, solidarity, and a genuine reverence for life.
j4orz retweeted
Diffusion will obviously work on any bitstream. With text, since humans read from first word to last, there is just the question of whether the delay to first sentence for diffusion is worth it. That said, the vast majority of AI workload will be video understanding and generation, so good chance diffusion is the biggest winner overall. Also means that the ratio of compute to memory bandwidth will increase.
j4orz retweeted
Replying to @StefanoErmon
Tesla is using single step diffusion for world model generation x.com/i/grok/share/CCHc5AW6U…

97
143
18
1,272
80% of the work is finding the spec
j4orz retweeted
AND Kimi also moves to the Frontier Tier with the release of K2 thinking
A tier list of China's top 19 open model builders. Who did we miss? At the frontier * DeepSeek * Qwen Close competitors * Moonshot AI (Kimi) * Zhipu / Z AI Noteworthy * StepFun * Tencent (Hunyuan) * RedNote (Xiaohongshu) * MiniMax * OpenGVLab / InternLM * Skywork On the rise * ByteDance Seed * OpenBMB * Xiaomi (MiMo) * Baidu (ERNIE) Honorable Mentions * Multimodal Art Projection * Alibaba International Digital Commerce Group * Beijing Academy of Artificial Intelligence (BAAI) * inclusionAI * Pangu (Huawei) I learned a lot from these. We have so much more we need to do to understand how their AI ecosystem works.
looking at the backward rules for an autodiff has the same beauty looking at a lisp interpreter. you can describe the world with the chain rule. you can simulate any turing-complete language.
Datacenters in space is the solar roadways of the 2020s.
5
1
1
99
j4orz retweeted
Replying to @natolambert
Actually I think it was a pretty eventful Fall so far. E.g., Qwen3-Next, DeepSeek V3.2, GLM 4.6, MiniMax-M2, Kimi Linear
3
2
52
j4orz retweeted
defending today 🥲
j4orz retweeted
If you'd like to win your own Dell Pro Max with GB300 we're launching a new kernel competition with @NVIDIAAI @sestercegroup @Dell to optimize NVF4 kernels on B200 2025 has seen a tremendous rise of pythonic kernel DSLs, we got on-prem hardware to have reliable ncu benchmarking available to all and we hope the best kernel DSL and the best kernel DSL author win
7
20
2
164