erogol · Jul 16, 2025 · 12:40 PM UTC

erogol

Pinned Tweet

erogol @erogol

Jul 16

XTTS is still being downloaded almost 5m times every month and 2.1m only on HF. It is greater than many recent hyped models. Hope people use it well for that to worth to my burnout that I’m still recovering from. Coqui has been one of the most successful broke startups

108

erogol · Nov 9, 2025 · 3:06 PM UTC

erogol @erogol

Nov 9

It is already called Meta-Learning. Do we need a new name ?

Google Research

@GoogleResearch

Nov 7

Introducing Nested Learning: A new ML paradigm for continual learning that views models as nested optimization problems to enhance long context processing. Our proof-of-concept model, Hope, shows improved performance in language modeling. Learn more: goo.gle/47LJrzI @GoogleAI

erogol · Nov 7, 2025 · 1:37 PM UTC

erogol @erogol

Nov 7

Testing KDA from Kimi-Linear - the best transformer variant I’ve found that matches full attention performance. Using interleaved FA like in official model. For the context length I use, it has no benefits yet, interesting! Try it: github.com/erogol/BlaGPT

GitHub - erogol/BlaGPT: Experimental playground for benchmarking language model (LM) architectures,...

Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration. - erogol/BlaGPT

github.com

erogol · Nov 3, 2025 · 12:37 PM UTC

erogol @erogol

Nov 3

Next Siri will be a Google Assistant ?? macrumors.com/2025/11/02/new…

New Version of Siri to 'Lean' on Google Gemini

In his "Power On" newsletter, Bloomberg's Mark Gurman today provided an update on the status of Apple Intelligence and the plans for it...

macrumors.com

erogol · Nov 1, 2025 · 3:28 PM UTC

erogol @erogol

Nov 1

Just read the paper but what if the simulation is based on a predictive system like a NN. Then the paper basically collapses.

Dr Singularity

@Dr_Singularity

Oct 31

Researchers have mathematically proven that the universe cannot be a computer simulation. Their paper in the Journal of Holography Applications in Physics shows that reality operates on principles beyond computation. Using Gödel’s incompleteness theorem, they argue that no algorithmic or computational system can fully describe the universe, because some truths, so called "Gödelian truths" require non algorithmic understanding, a form of reasoning that no computer or simulation can reproduce. Since all simulations are inherently algorithmic, and the fundamental nature of reality is non algorithmic, the researchers conclude that the universe cannot be, and could never be a simulation.

erogol · Oct 31, 2025 · 12:29 PM UTC

erogol @erogol

Oct 31

Just a small detail before the buzz train, this is a hybrid model and still using full-attn.

Kimi.ai

@Kimi_Moonshot

Oct 30

Kimi Linear Tech Report is dropped! 🚀 huggingface.co/moonshotai/Ki… Kimi Linear: A novel architecture that outperforms full attention with faster speeds and better performance—ready to serve as a drop-in replacement for full attention, featuring our open-sourced KDA kernels! Kimi Linear offers up to a 75% reduction in KV cache usage and up to 6x decoding throughput at a 1M context length. Key highlights: 🔹 Kimi Delta Attention: A hardware-efficient linear attention mechanism that refines the gated delta rule. 🔹 Kimi Linear Architecture: The first hybrid linear architecture to surpass pure full attention quality across the board. 🔹 Empirical Validation: Scaled, fair comparisons + open-sourced KDA kernels, vLLM integration, and checkpoints. The future of agentic-oriented attention is here! 💡

erogol · Oct 29, 2025 · 2:52 PM UTC

erogol @erogol

Oct 29

ms is killing github slowly with copilot

erogol · Oct 23, 2025 · 3:43 PM UTC

erogol @erogol

Oct 23

like diffusion 🤔

Akshat Gupta

@akshatgupta57

Oct 23

🧠 New preprint: How Do LLMs Use Their Depth? We uncover a “Guess-then-Refine” mechanism across layers - early layers predict high-frequency tokens as guesses; later layers refine them as context builds Paper - arxiv.org/abs/2510.18871 @neuranna @GopalaSpeech @berkeley_ai

erogol · Oct 23, 2025 · 11:34 AM UTC

erogol @erogol

Oct 23

pls stop asking for feedback @claudeai-code !!

Amir Habibian · Oct 22, 2025 · 7:54 PM UTC

erogol retweeted

Amir Habibian @amir_habibian

Oct 22

🚀 Linearizing Video Diffusion Transformers (Wan 2.1) in less than 0.5K GPU hours 🚀 qualcomm-ai-research.github.… TLDR: Balance the expressiveness of self-attention and efficiency of linear attention in a hybrid attention distillation framework. arxiv.org/abs/2509.24899

Attention Surgery: An Efficient Recipe to Linearize Your Video...

Transformer-based video diffusion models (VDMs) deliver state-of-the-art video generation quality but are constrained by the quadratic cost of self-attention, making long sequences and high...

arxiv.org

erogol · Oct 16, 2025 · 12:46 PM UTC

erogol @erogol

Oct 16

I still play it

aaron

@aarondev

Oct 15

if you ever think you’re a good programmer just remember this dude wrote roller coaster tycoon by himself completely in assembly, earning him $30m in royalties

机器之心 JIQIZHIXIN · Oct 14, 2025 · 7:34 AM UTC

erogol retweeted

机器之心 JIQIZHIXIN

@jiqizhixin

Oct 14

Definitely massive! A revolution at the heart of image generation! Representation Autoencoders (RAEs) are a simple yet powerful upgrade that replaces the traditional VAE with pretrained encoders—such as DINO, SigLIP, or MAE—paired with trained decoders. Why it matters: - Richer latent spaces – semantically meaningful, not just reconstructive - Faster convergence – no extra alignment loss needed - Higher fidelity – achieves FID scores of 1.51 (without guidance) and 1.13 at 256×256 and 512×512 resolutions By rethinking the foundation, RAEs make diffusion transformers simpler, stronger, and smarter.

Saining Xie

@sainingxie

Oct 14

three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)

234

erogol · Oct 1, 2025 · 4:11 PM UTC

erogol @erogol

Oct 1

Here is my take on new DeepSeek-V3.2-Exp erogol.substack.com/p/model-…

erogol · Sep 26, 2025 · 12:17 PM UTC

erogol @erogol

Sep 26

ASR model in the iOS app is really bad @claudeai

erogol · Sep 26, 2025 · 12:13 PM UTC

erogol @erogol

Sep 26

XTTS is yet one of the most prosodically diverse models. Its aging well arxiv.org/abs/2509.19928

Measuring Prosody Diversity in Zero-Shot TTS: A New Metric,...

Prosody diversity is essential for achieving naturalness and expressiveness in zero-shot text-to-speech (TTS). However, frequently used acoustic metrics capture only partial views of prosodic...

arxiv.org

erogol · Sep 26, 2025 · 12:02 PM UTC

erogol @erogol

Sep 26

Went from 2.2 to 2.5 in 5 days. Impressive.

Wan

@Alibaba_Wan

Sep 26

Wan2.5: One Prompt, Perfect 'Vibe PSing'! Wan 2.5-Preview is now live with image editing. ✨ Instruction-based Image Editing.Supports a wide range of image-editing tasks and reliably follows instructions. ✨ Visual Elements Consistency.Supports generation from single- or multiple-image references, maintaining consistency of visual elements such as faces, products, and styles.

erogol · Sep 22, 2025 · 5:27 PM UTC

erogol @erogol

Sep 22

My post on Xiaomi's MiMo-Audio They show Speech follows same scaling laws as text LLMs Trained on 100M+ hours and shows emergent few-shot learning: • Voice conversion • Emotion transfer • Speech translation • Cross-modal reasoning open.substack.com/pub/erogol…

erogol · Sep 22, 2025 · 5:19 PM UTC

erogol @erogol

Sep 22

Sounds good !

Tongyi Lab

@Ali_TongyiLab

Sep 22

Introducing Qwen3-TTS! 🗣️ Our new text-to-speech model is designed to be multi-timbre, multi-lingual, and multi-dialect for natural, expressive audio. It delivers strong performance in English & Chinese, and we're excited for you to hear it for yourself!

erogol · Sep 19, 2025 · 5:01 PM UTC

erogol @erogol

Sep 19

Great release from Xiaomi. - open 7B model - better than Gemini on audio understanding - better than GPT4o-audio on reasoning Chinese literally killing it Time to go deeper and maybe a blog-post afterwards... xiaomimimo.github.io/MiMo-Au…

erogol · Sep 18, 2025 · 2:21 PM UTC

erogol @erogol

Sep 18

Dario Amodei says Claude now writes 70-90% of code at Anthropic... I believe him :)

Claude

@claudeai

Sep 17

Replying to @claudeai

To state it plainly: We never reduce model quality due to demand, time of day, or server load. The problems our users reported were due to infrastructure bugs alone.

Tongyi Lab · Sep 16, 2025 · 4:24 PM UTC

erogol retweeted

Tongyi Lab

@Ali_TongyiLab

Sep 16

1/7 We're launching Tongyi DeepResearch, the first fully open-source Web Agent to achieve performance on par with OpenAI's Deep Research with only 30B (Activated 3B) parameters! Tongyi DeepResearch agent demonstrates state-of-the-art results, scoring 32.9 on Humanity's Last Exam, 45.3 on BrowseComp, and 75.0 on the xbench-DeepSearch benchmark.

118

494

3,294