Ritchie Ng · Nov 9, 2025 · 4:44 AM UTC

Ritchie Ng

Ritchie Ng

@RitchieNg

13h

Cerebras code powered by open source GLM model

Cerebras

@cerebras

Nov 7

Cerebras Code just got an UPGRADE. It's now powered by GLM 4.6 Pro Plans ($50): 300k ▶️ 1M TPM @ 24M Tokens/day Max Plans ($200): 400k ▶️ 1.5M TPM @ 120M Tokens/day Fastest GLM provider on the planet at 1000 tokens/s and at 131K context. Get yours before we run out 👇

Ritchie Ng · Nov 9, 2025 · 4:43 AM UTC

Ritchie Ng

@RitchieNg

13h

Interesting trend on strong competing alternatives to NVIDIA

tphuang

@tphuang

Nov 8

Jensen is getting desperate. China is shutting off a major src of demand for Nvidia chips, transshipment to Chinese SOE DCs. I have pretty solid src that just 6000 out of 100k racks in Meta's Hyperion DC will be Nvidia cards. Google TPUs are supplying most of its future demand. Same w/ Amazon & Trainium. I've said this many times now. The absolute compute per card should not be overvalued. You can achieve same w/ compute w/ more lower cost cards & get better performance through more memory + faster interconnect. On top of that, US data build out is facing logjam due to energy & supply chain issues. You can check the lead time on diesel generators & gas turbine. I overstated optical transceiver issues since the big time has hit supply chain constraint b4 we even got there. Jensen knows Nvidia is facing an upcoming cliff. Altman sees the same issue. Hence all the begging for govt help. At end of the day, Chinese AI labs have shown you can do leading models w/o having unlimited compute, so why do we need to keep proclaiming build out speed that's not achievable?

Ritchie Ng · Nov 9, 2025 · 4:22 AM UTC

Ritchie Ng

@RitchieNg

13h

Interesting GEO opportunities

World of Statistics

@stats_feed

22h

Replying to @sama

The Most Cited Websites by AI Models (citation frequency): 1. reddit - 40.1% 2. wikipedia - 26.3% 3. youtube - 23.5% 4. google - 23.3% 5. yelp - 21.0% 6. facebook - 20.0% 7. amazon - 18.7% 8. tripadvisor - 12.5% 9. mapbox - 11.3% 10. openstreetmap - 11.3% 11. instagram - 10.9% 12. mapquest - 9.8% 13. walmart - 9.3% 14. ebay - 7.7% 15. linkedin - 5.9% 16. quora - 4.6% 17. homedepot - 4.6% 18. yahoo - 4.4% 19. target - 4.3% 20. pinterest - 4.2% Source: Semrush, as of June 2025.

Ritchie Ng · Nov 8, 2025 · 12:52 PM UTC

Ritchie Ng

@RitchieNg

Nov 8

Interesting perspective on the weight tensors of LLMs

Goodfire

@GoodfireAI

Nov 6

LLMs memorize a lot of training data, but memorization is poorly understood. Where does it live inside models? How is it stored? How much is it involved in different tasks? @jack_merullo_ & @srihita_raju's new paper examines all of these questions using loss curvature! (1/7)

Ritchie Ng · Nov 8, 2025 · 2:03 AM UTC

Ritchie Ng

@RitchieNg

Nov 8

Continual learning via nested optimisation

Google Research

@GoogleResearch

Nov 7

Introducing Nested Learning: A new ML paradigm for continual learning that views models as nested optimization problems to enhance long context processing. Our proof-of-concept model, Hope, shows improved performance in language modeling. Learn more: goo.gle/47LJrzI @GoogleAI

Ritchie Ng · Nov 7, 2025 · 1:56 PM UTC

Ritchie Ng

@RitchieNg

Nov 7

SOTA for Kimi K2 Thinking - not just open source SOTA

Artificial Analysis

@ArtificialAnlys

Nov 6

MoonshotAI has released Kimi K2 Thinking, a new reasoning variant of Kimi K2 that achieves #1 in the Tau2 Bench Telecom agentic benchmark and is potentially the new leading open weights model Kimi K2 Thinking is one of the largest open weights models ever, at 1T total parameters with 32B active. K2 Thinking is the first reasoning model release within @Kimi_Moonshot's Kimi K2 model family, following non-reasoning Kimi K2 Instruct models released previously in July and September 2025. Key takeaways: ➤ Strong performance on agentic tasks: Kimi K2 Thinking achieves 93% in 𝜏²-Bench Telecom, an agentic tool use benchmark where the model acts as a customer service agent. This is the highest score we have independently measured. Tool use in long horizon agentic contexts was a strength of Kimi K2 Instruct and it appears this new Thinking variant makes substantial gains ➤ Reasoning variant of Kimi K2 Instruct: The model, as per its naming, is a reasoning variant of Kimi K2 Instruct. The model has the same architecture and same number of parameters (though different precision) as Kimi K2 Instruct and like K2 Instruct only supports text as an input (and output) modality ➤ 1T parameters but INT4 instead of FP8: Unlike Moonshot’s prior Kimi K2 Instruct releases that used FP8 precision, this model has been released natively in INT4 precision. Moonshot used quantization aware training in the post-training phase to achieve this. The impact of this is that K2 Thinking is only ~594GB, compared to just over 1TB for K2 Instruct and K2 Instruct 0905 - which translates into efficiency gains for inference and training. A potential reason for INT4 is that pre-Blackwell NVIDIA GPUs do not have support for FP4, making INT4 more suitable for achieving efficiency gains on earlier hardware. Our full set of Artificial Analysis Intelligence Index benchmarks are in progress and we will provide an update as soon as they are complete.

Ritchie Ng · Nov 7, 2025 · 1:34 PM UTC

Ritchie Ng

@RitchieNg

Nov 7

Autonomous agent for data science

Google Research

@GoogleResearch

Nov 6

DS-STAR is a state-of-the-art data science agent designed to autonomously solve complex data science problems. It automates tasks from analysis to data wrangling across diverse data types to achieve top performance on challenging benchmarks. Learn more: goo.gle/3WHBMNS

Ritchie Ng · Nov 7, 2025 · 1:33 PM UTC

Ritchie Ng

@RitchieNg

Nov 7

SOTA DS Agent

Google Research

@GoogleResearch

Nov 6

Ritchie Ng · Nov 7, 2025 · 12:49 PM UTC

Ritchie Ng

@RitchieNg

Nov 7

Learning lesson on IP bandwidth bottleneck

Kimi.ai

@Kimi_Moonshot

Nov 7

Fixed the token generation speed on platform.moonshot.ai & OpenRouter! Shocking plot twist: The bottleneck was IP bandwidth, NOT GPU count 😱 We threw compute at the problem. The answer was thicker pipes, not more cards. Lesson: When serving LLMs, check your network limits before your GPU count 🔥

Ritchie Ng · Nov 7, 2025 · 9:14 AM UTC

Ritchie Ng

@RitchieNg

Nov 7

New Kimi model

Kimi.ai

@Kimi_Moonshot

Nov 6

🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built as a thinking agent, K2 Thinking marks our latest efforts in test-time scaling — scaling both thinking tokens and tool-calling turns. K2 Thinking is now live on kimi.com in chat mode, with full agentic mode coming soon. It is also accessible via API. 🔌 API is live: platform.moonshot.ai 🔗 Tech blog: moonshotai.github.io/Kimi-K2… 🔗 Weights & code: huggingface.co/moonshotai

Ritchie Ng · Nov 7, 2025 · 9:14 AM UTC

Ritchie Ng

@RitchieNg

Nov 7

Kimi K2 thinking model release

Kimi.ai

@Kimi_Moonshot

Nov 6

Ritchie Ng · Nov 5, 2025 · 9:59 AM UTC

Ritchie Ng

@RitchieNg

Nov 5

Good to know my bills will keep going down

Rohan Paul

@rohanpaul_ai

Nov 4

LLM token prices are collapsing fast, and the collapse is steepest at the top end. The least "intelligent" models get about 9× cheaper per year, mid-tier models drop about 40× per year, and the most capable models fall about 900× per year. Was same with "Moore’s Law, the best contemporary example of Jevons paradox. This extraordinary collapse in computing costs – a billionfold improvement – did not lead to modest, proportional increases in computer use. It triggered an explosion of applications that would have been unthinkable at earlier price points. " --- a16z .substack.com/p/why-ac-is-cheap-but-ac-repair-is

Ritchie Ng · Nov 4, 2025 · 9:09 AM UTC

Ritchie Ng

@RitchieNg

Nov 4

KV cache communication over natural language for agents talking to one another

机器之心 JIQIZHIXIN

@jiqizhixin

Nov 3

Wow, language models can talk without words. A new framework, Cache-to-Cache (C2C), lets multiple LLMs communicate directly through their KV-caches instead of text, transferring deep semantics without token-by-token generation. It fuses cache representations via a neural projector and gating mechanism for efficient inter-model exchange. The payoff: up to 10% higher accuracy, 3–5% gains over text-based communication, and 2× faster responses. Cache-to-Cache: Direct Semantic Communication Between Large Language Models Code: github.com/thu-nics/C2C Project: github.com/thu-nics Paper: arxiv.org/abs/2510.03215 Our report: mp.weixin.qq.com/s/tjDq99VrE… 📬 #PapersAccepted by Jiqizhixin

Ritchie Ng · Nov 4, 2025 · 9:08 AM UTC

Ritchie Ng

@RitchieNg

Nov 4

Agents to agents talking to one another through KV cache instead of text

机器之心 JIQIZHIXIN

@jiqizhixin

Nov 3

Ritchie Ng · Nov 3, 2025 · 9:33 PM UTC

Ritchie Ng

@RitchieNg

Nov 3

Interesting, no need for dedicated azure landing zone just for openai if you’re on AWS

Andy Jassy

@ajassy

Nov 3

New multi-year, strategic partnership with @OpenAI will provide our industry-leading infrastructure for them to run and scale ChatGPT inference, training, and agentic AI workloads. Allows OpenAI to leverage our unusual experience running large-scale AI infrastructure securely, reliably, and at scale. OpenAI will start using AWS’s infrastructure immediately and we expect to have all of the capacity deployed before end of next year-- with the ability to expand in 2027 and beyond. aboutamazon.com/news/aws/aws…

Ritchie Ng · Nov 3, 2025 · 12:46 PM UTC

Ritchie Ng

@RitchieNg

Nov 3

Interesting relationship between language and programming aptitude

LaurieWired

@lauriewired

Oct 31

The biggest predictor of coding ability is Language Aptitude. Not Math. A study posted in Nature found that numeracy accounts for just 2% of skill variance. Meanwhile, the neural behaviors associated with language accounted for 70% of skill variance.

Ritchie Ng · Nov 3, 2025 · 12:41 PM UTC

Ritchie Ng

@RitchieNg

Nov 3

Metadata is all you need

Shayan

@ImSh4yy

Nov 3

TIL .docx, .xlsx, and .pptx are just .zip archives with mostly xml inside.

Ritchie Ng · Nov 2, 2025 · 1:27 PM UTC

Ritchie Ng

@RitchieNg

Nov 2

New low latency MMLM

Meituan LongCat

@Meituan_LongCat

Oct 31

🔥 LongCat-Flash-Omni: Multimodal + Low-Latency 🏆 Leading Performance among Open-Source Omni-modal Models ☎️ Real-time Spoken Interaction: Millisecond-level E2E latency 🕒 128K context + Supports > 8min real-time AV interaction 🎥 Multimodal I/O: Arbitrary Combination of Text/Image/Audio/Video Input → Text/Speech Output (w/ LongCat-Audio-Codec) ⚙ ScMoE architecture on LongCat-Flash: 560B Parameters, 27B Active 🧠 Training: Novel Early-Fusion Omni-modal training paradigm -> No Single Modality Left Behind 🚀 Efficient Infrastructure: With optimized modality-decoupled parallel training, Omni sustains >90% throughput of text-only training efficiency 🤗 Model open-sourced: 【Hugging Face】huggingface.co/meituan-longc… 【GitHub】github.com/meituan-longcat/L… 📱 LongCat APP is here—available for both iOS and Android! Scan the QR code to quickly try its awesome voice interaction features! 💻For the PC experience, you can click on LongCat.AI for a free trial!

Ritchie Ng · Nov 2, 2025 · 1:27 PM UTC

Ritchie Ng

@RitchieNg

Nov 2

New open source MMLM by a food delivery company - pretty good

Meituan LongCat

@Meituan_LongCat

Oct 31

Ritchie Ng · Nov 2, 2025 · 1:23 PM UTC

Ritchie Ng

@RitchieNg

Nov 2

Interesting

a16z

@a16z

Oct 31

China has overtaken the US in cumulative open-source AI model downloads: