ML Engineer

india
Joined February 2013
Pooja Palod retweeted
New on the Anthropic Engineering blog: tips on how to build more efficient agents that handle more tools while using fewer tokens. Code execution with the Model Context Protocol (MCP): anthropic.com/engineering/co…
Why can ChatGPT write a beautiful poem but not a funny joke? 🤔 Because it has a huge rearview mirror (all of human knowledge) but only a tiny headlight (sees one word ahead). Brilliant analogy from Luis Serrano’s video 👇 🎥piped.video/watch?v=gGWufx8D…
Pooja Palod retweeted
Here's my beginner's lecture series for RAG, Vector Database, Agent, and Multi-Agents: Download slides: 👇 * RAG: byhand.ai/p/beginners-guide-… * Agents: byhand.ai/p/beginners-guide-… * Vector Database: byhand.ai/p/beginners-guide-… * Multi-Agents: byhand.ai/p/beginners-guide-… --- 100% original, made by hand ✍️ Join 47K+ readers of my newsletter: byhand.ai
21
475
1
2,559
RNNs walked so Transformers could run. Attention wasn’t just a mechanism- it was a revolution. Read why 👇 open.substack.com/pub/datajo…
1
2
Hey folks, Check out this new article on transfomers.🧠 The Need for Transformers open.substack.com/pub/datajo…
Pooja Palod retweeted
An exciting new course: Fine-tuning and Reinforcement Learning for LLMs: Intro to Post-training, taught by @realSharonZhou, VP of AI at @AMD. Available now at DeepLearning.AI. Post-training is the key technique used by frontier labs to turn a base LLM--a model trained on massive unlabeled text to predict the next word/token--into a helpful, reliable assistant that can follow instructions. I've also seen many applications where post-training is what turns a demo application that works only 80% of the time into a reliable system that consistently performs. This course will teach you the most important post-training techniques! In this 5 module course, Sharon walks you through the complete post-training pipeline: supervised fine-tuning, reward modeling, RLHF, and techniques like PPO and GRPO. You'll also learn to use LoRA for efficient training, and to design evals that catch problems before and after deployment. Skills you'll gain: - Apply supervised fine-tuning and reinforcement learning (RLHF, PPO, GRPO) to align models to desired behaviors - Use LoRA for efficient fine-tuning without retraining entire models - Prepare datasets and generate synthetic data for post-training - Understand how to operate LLM production pipelines, with go/no-go decision points and feedback loops These advanced methods aren’t limited to frontier AI labs anymore, and you can now use them in your own applications. Learn here: deeplearning.ai/courses/fine…
Your growth is your responsibility. Don’t outsource it. #GrowthMindset #SelfDevelopment #Ownership #CareerGrowth #Leadersh
Pooja Palod retweeted
Just finished 40+ pages Stanford CodeMonkeys Paper+ Andrew Ng Post + Together AI MoA Blog. 𝗔𝗹𝗹 𝗽𝗼𝗶𝗻𝘁 𝘁𝗼 𝗼𝗻𝗲 𝘁𝗿𝘂𝘁𝗵: 𝗣𝗮𝗿𝗮𝗹𝗹𝗲𝗹 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁𝘀 are how AI scales. WHY PARALLEL MATTERS 𝟭 Training compute is tapped out — inference-time scaling is next. 𝟮 Serial reasoning boosts accuracy but slows response. 𝟯 Parallel agents cut latency while keeping coverage high. 𝟰 Falling token costs make parallelism practical. 𝟱 Decomposing tasks for agents is still hard. 𝟲 Test-time compute may rival training-time scaling. 𝟳 Parallelism trades compute for both speed and accuracy. 𝟴 Think “distributed cognition” — teams of minds in sync. REAL-WORLD EXAMPLES 𝟵 Research agents scan multiple pages in parallel. 𝟭𝟬 Coding frameworks split repos for simultaneous fixes. 𝟭𝟭 Heavy background agent + fast UI agent = smoother UX. 𝟭𝟮 Stanford’s CodeMonkeys: 57.4% SWE-bench Verified. 𝟭𝟯 Barrel of Monkeys ensemble: 66.2% accuracy. 𝟭𝟰 Competitions show log-linear gains with more samples. 𝟭𝟱 Synthetic data pipelines thrive on multi-generator diversity. 𝟭𝟲 Report assistants draft in parallel, then aggregate. MIXTURE-OF-AGENTS (MoA) 𝟭𝟳 Together’s MoA: 4 LLMs + 1 aggregator. 𝟭𝟴 Beats GPT-4 Omni (65.1% vs 57.5% on AlpacaEval). 𝟭𝟵 Ideal when quality > latency (e.g. synthetic data). 𝟮𝟬 MoA fits in <50 lines of Python. 𝟮𝟭 Diversity + aggregation > single model. 𝟮𝟮 Aggregators filter noise, act as judges. 𝟮𝟯 Layered MoA = compounding gains. 𝟮𝟰 Open-source ensembles can beat frontier closed models. DESIGN PRINCIPLES 𝟮𝟱 Balance serial vs parallel — more isn’t always better. 𝟮𝟲 Reuse context scans to save cost. 𝟮𝟳 Use executable tests, not just vibes. 𝟮𝟴 Smart selection halves the gap to oracle results. 𝟮𝟗 Strong selectors > diverse generators alone. 𝟯𝟬 Majority voting remains simple + powerful. 𝟯𝟭 Too many agents without orchestration = waste. 𝟯𝟮 Heterogeneous models prevent groupthink. THINKING AHEAD 𝟯𝟯 Async workflows will beat rigid sequential chains. 𝟯𝟰 Role-based agent teams (UI, verifier, worker) are next. 𝟯𝟱 Agent throughput may soon rival human teams. 𝟯𝟲 100M token windows supercharge parallelism. 𝟯𝟳 Negotiating/critique agents > isolated runs. 𝟯𝟴 Budget-aware agents will drive real deployments. 𝟯𝟵 Human-in-loop orchestration stays key early on. 𝟰𝟬 Ceiling is high: hundreds of agents in parallel. Stanford: arxiv.org/pdf/2501.14723 Andrew Ng: deeplearning.ai/the-batch/is… TogetherAI: docs.together.ai/docs/mixtur… ≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣ ⫸ꆛ Want to build Real-World AI Agents? Join My 𝗛𝗮𝗻𝗱𝘀-𝗼𝗻 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁 𝟱-𝗶𝗻-𝟭 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴! ➠ Build Agents for Healthcare, Finance, Smart Cities & More ➠ Master 5 Modules: 𝗠𝗖𝗣 · LangGraph · PydanticAI · CrewAI · Swarm ➠ Work with Text, Audio, Video, Tabular, and Vision Data 👉 𝗘𝗻𝗿𝗼𝗹𝗹 𝗡𝗢𝗪 (𝟱𝟲% 𝗢𝗙𝗙): maryammiradi.com/ai-agents-m…
4
14
Pooja Palod retweeted
Do give it a read “Modular Prompt Design-Building Blocks Over Monoliths (Part-1)” medium.com/@deepakkumar05.it…
1
1
Pooja Palod retweeted
I'm writing about evaluations in Generative AI systems. Do keep an eye on my substack posts open.substack.com/pub/deepak… #generativeai #evaluation
1
1
1
🪆 Matryoshka Embeddings: Russian Dolls for Vectors Instead of storing huge flat embeddings, nest them like Russian dolls. ✅ Multi-resolution ✅ Memory-efficient ✅ Fast search Read here datajourney24.substack.com/p…
🪆 Matryoshka Embeddings: Russian Dolls for Vectors Instead of storing huge flat embeddings, nest them like Russian dolls. ✅ Multi-resolution ✅ Memory-efficient ✅ Fast search Read here datajourney24.substack.com/p…
🪆 Matryoshka Embeddings: Russian Dolls for Vectors Instead of storing huge flat embeddings, nest them like Russian dolls. ✅ Multi-resolution ✅ Memory-efficient ✅ Fast search Read here datajourney24.substack.com/p…
1
1
Life is like a learning model—every experience trains you, every challenge fine-tunes you, and every moment is data for your growth. Appreciate the journey and keep evolving.
Pooja Palod retweeted
gpt-oss is out! we made an open model that performs at the level of o4-mini and runs on a high-end laptop (WTF!!) (and a smaller one that runs on a phone). super proud of the team; big triumph of technology.
ML system design interviews aren't about fancy models. They're about scoping the problem right before solving it. Ask: Ranking, classification, or something else? Search vs Recommendation vs Feed? Real-time or batch? What constraints matter? Scope first. Build later.
Pooja Palod retweeted
New Course: Post-training of LLMs Learn to post-train and customize an LLM in this short course, taught by @BanghuaZ, Assistant Professor at the University of Washington @UW, and co-founder of @NexusflowX. Training an LLM to follow instructions or answer questions has two key stages: pre-training and post-training. In pre-training, it learns to predict the next word or token from large amounts of unlabeled text. In post-training, it learns useful behaviors such as following instructions, tool use, and reasoning. Post-training transforms a general-purpose token predictor—trained on trillions of unlabeled text tokens—into an assistant that follows instructions and performs specific tasks. Because it is much cheaper than pre-training, it is practical for many more teams to incorporate post-training methods into their workflows than pre-training. In this course, you’ll learn three common post-training methods—Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning (RL)—and how to use each one effectively. With SFT, you train the model on pairs of input and ideal output responses. With DPO, you provide both a preferred (chosen) and a less preferred (rejected) response and train the model to favor the preferred output. With RL, the model generates an output, receives a reward score based on human or automated feedback, and updates the model to improve performance. You’ll learn the basic concepts, common use cases, and principles for curating high-quality data for effective training. Through hands-on labs, you’ll download a pre-trained model from Hugging Face and post-train it using SFT, DPO, and RL to see how each technique shapes model behavior. In detail, you’ll: - Understand what post-training is, when to use it, and how it differs from pre-training. - Build an SFT pipeline to turn a base model into an instruct model. - Explore how DPO reshapes behavior by minimizing contrastive loss—penalizing poor responses and reinforcing preferred ones. - Implement a DPO pipeline to change the identity of a chat assistant. - Learn online RL methods such as Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO), and how to design reward functions. - Train a model with GRPO to improve its math capabilities using a verifiable reward. Post-training is one of the most rapidly developing areas of LLM training. Whether you’re building a high-accuracy context-specific assistant, fine-tuning a model's tone, or improving task-specific accuracy, this course will give you experience with the most important techniques shaping how LLMs are post-trained today. Please sign up here: deeplearning.ai/short-course…