DeepSeek-V3.2 shows:
- Chinese chips are rising: Day-0 support for Huawei Ascend & Cambricon;
- ML compiler: DeepSeek uses TileLang, letting you write Python → compile to optimized kernels on diverse hardware. E.g., 80 lines of Python can reach 95% of FlashMLA’s (CUDA written by hand) perf.
Under the hood of TileLang is TVM, an ML compiler I spent years working on with the great open-source community. As the hardware landscape diversifies (Nvidia GPUs, Chinese chips, and inference-focused chips), ML compilers will shine again.