Steve Sperandeo 🇨🇦 retweeted
McKinsey just dropped its 2025 AI report. 1. Everyone’s testing, few are scaling. 88% of companies now use AI somewhere. Only 33% have scaled it beyond pilots. 2. The profit gap is huge. Just 6% see real EBIT impact. Most are still stuck in “experiments,” not execution. 3. The winners think bigger. Top performers aren’t cutting costs. They’re redesigning workflows and creating new products. 4. AI agents are emerging. 23% are testing agents. Only 10% have scaled them (mostly in IT and R&D). 5. The jobs shift is starting. 30% of companies expect workforce reductions next year, mostly in junior or support roles. TL;DR: AI adoption is nearly universal. Impact isn’t. The gap between pilots and profit is where the next unicorns will be built.
Steve Sperandeo 🇨🇦 retweeted
Universal Ostrich Farms - Edgewood, BC "That's your Canadian government right there that just did this. I went to Bosnia, Somalia and Afghanistan and I did not serve my country for this bullsh*t that's in front of us. The government committed their own a-f*ucking-trocity" Sgt. Mike Rude (retired)
Steve Sperandeo 🇨🇦 retweeted
OMG 💔 appears the CFIA slaughtered the entire flock of healthy ostriches through the night 📸 @DreaHumphrey Dark day for Canada
Steve Sperandeo 🇨🇦 retweeted
Universal Ostrich Farm Update I just confirmed, reports are that hundreds of shots fired in the area of the kill pen at Universal Ostrich Farm. About 100 shots in the first hour, then a shift change, then hundreds more. Sounds like a high powered rifle, and the speculation is the platforms setup today are being used. This is happening under the cover of night, and flood lights are being used. Family and supporters are forced to listen to this as the entrance to the property is blocked by RCMP and anyone who leaves are not being let back in. This is inhumane treatment of both the ostriches and the people there. Tonight, the last of the hope for Canada is dying in Edgewood BC.
Steve Sperandeo 🇨🇦 retweeted
Scaling Agent Learning via Experience Synthesis 📝: arxiv.org/abs/2511.03773 Scaling training environments for RL by simulating them with reasoning LLMs! Environment models + Replay-buffer + New tasks = cheap RL for any environments! - Strong improvements over non-RL-ready environments and multiple model families! - Works better in sim-2-real RL settings → Warm-start for high-cost environments 🧵1/7
Steve Sperandeo 🇨🇦 retweeted
The paper behind Kosmos. An AI scientist that runs long, parallel research cycles to autonomously find and verify discoveries. One run can coordinate 200 agents, write 42,000 lines of code, and scan 1,500 papers. A shared world model stores facts, results, and plans so agents stay in sync. Given a goal and dataset, it runs analyses and literature searches in parallel and updates that model. It then proposes next tasks and repeats until it writes a report with traceable claims. Experts judged 79.4% of statements accurate and said 20 cycles equals about 6 months of work. Across 7 studies, it reproduced unpublished results, added causal genetics evidence, proposed a disease timing breakpoint method, and flagged a neuron aging mechanism. It needs clean, well labeled data, can overstate interpretations, and still requires human review. Net effect, it scales data driven discovery with clear provenance and steady context across fields. ---- Paper – arxiv. org/abs/2511.02824 Paper Title: "Kosmos: An AI Scientist for Autonomous Discovery"
📈 Edison Scientific launched Kosmos, an autonomous AI researcher that reads literature, writes and runs code, tests ideas. Compresses 6 months of human research into about 1 day. Kosmos uses a structured world model as shared memory that links every agent’s findings, keeping work aligned to a single objective across tens of millions of tokens. A run reads 1,500 papers, executes 42,000 lines of analysis code, and produces a fully auditable report where every claim is traceable to code or literature. Evaluators found 79.4% of conclusions accurate, it reproduced 3 prior human findings including absolute humidity as the key factor for perovskite solar cell efficiency and cross species neuronal connectivity rules, and it proposed 4 new leads including evidence that SOD2 may lower cardiac fibrosis in humans. Access is through Edison’s platform at $200/run with limited free use for academics. There are caveats since runs can chase statistically neat but irrelevant signals, longer runs raise this risk, and teams often launch multiple runs to explore different paths. Beta users estimated 6.14 months of equivalent effort for 20 step runs, and a simple model based on reading time and analysis time predicts about 4.1 months, which suggests output scales with run depth rather than hitting a fixed ceiling.
Steve Sperandeo 🇨🇦 retweeted
MemSearcher trains LLM search agents to keep a compact memory, boosting accuracy and cutting cost. Most agents copy the full history, bloating context and slowing inference, but MemSearcher keeps only essential facts. Each turn it reads the question and memory, then searches or answers. After reading results, it rewrites memory to keep only what matters. This holds token length steady across turns, lowering compute and GPU use. Training uses reinforcement learning with Group Relative Policy Optimization. Their variant shares a session reward across turns, teaching memory, search, and reasoning together. Across 7 QA benchmarks it beats strong baselines, with 3B surpassing some 7B agents. It uses fewer tokens than ReAct, so long tasks stay efficient and reliable. ---- Paper – arxiv. org/abs/2511.02805 Paper Title: "MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning"
6
17
2
110
Steve Sperandeo 🇨🇦 retweeted
🧵 Dr. Robert Redfield (former CDC Director) said that he’s seen ~85% success in patients with Long Covid within 1–3 years (vaccine-injured are more resistant to treatment and he doesn't know exactly why). I took notes. Here’s what he reportedly uses, and what each targets 👇 Fatigue / Cognitive Dysfunction / PN (peripheral neuropathy) • Maraviroc (Selzentry) – 300 mg 2×/day (CCR5 antagonist; blocks immune cell trafficking) • Rapamycin (Sirolimus) – 1–2 mg/day (mTOR inhibitor; modulates immune aging, inflammation) 🧩 Probenecid(?) was also mentioned — typically used for gout (helps excrete uric acid), but it also affects OAT transporters and viral replication pathways, so perhaps that’s the rationale. Hypercoagulation / Vascular Dysfunction • Apixaban (Eliquis) – anticoagulant • Plavix (Clopidogrel) – antiplatelet • Aspirin – antiplatelet, anti-inflammatory Doses weren't mentioned. Triple therapy is aggressive and reserved for documented hypercoagulable states, and likely requires close monitoring (a microclotting diagnostician was mentioned in Florida around 50 minutes into the podcast). Mast Cell Stabilization • Pepcid (Famotidine) – 40 mg 2×/day (H₂ blocker; sometimes paired with H₁ antihistamines) “Can’t breathe, but fine when swimming?” Redfield suggested that’s often due to venous congestion — blood pooling from impaired return (e.g., pelvic compression or May-Thurner–type syndromes). 🩻 Veinogram → possible stent surgery for relief.
🚨 🚨 NEW: Round 2 with ex-CDC Director, Dr. Robert Redfield!! An HIV pioneer, virologist, infectious diseases doctor, & pandemic whistleblower, he’s back with never-before-heard revelations. (FOR REAL 👀) Our last interview made global headlines when he revealed that the original Covid viral lines likely came from Ralph Baric’s coronavirus research lab at University of North Carolina. We go FURTHER today into the US Role in Covid! Now he returns with a new book, “Redfield’s Warning: What I Learned (But Couldn’t Tell You) Might Save Your Life.” I highly recommend it! 8 Explosive Highlights from our Interview • COVID engineered as aerosolized, self-spreading vaccine 🤯 • Vaccine mandates, pharma immunity, side effect denial — all “mistakes” • ‼️ Long COVID driven by viral persistence & remarkable treatments he’s discovered • How FDA’s Peter Marks killed Novavax • mRNA may cause cancer via residual nucleic acid & can produce ongoing spike 🤯 • Antibody-dependent enhancement possible with boosters • The Chronic Lyme / long COVID connection • PREP Act immunity should be repealed & concerns about new NIAID Director ✅Please SUBSCRIBE to my YT channel, comment & share!! And also, consider supporting my hard work by becoming a paid subscriber on Substack or sponsor! I’d greatly appreciate it! 🙏🩷
Steve Sperandeo 🇨🇦 retweeted
XBOW raised $117M to build AI hacking agents. Now someone just open-sourced it for FREE. Strix deploys autonomous AI agents that act like real hackers - they run your code dynamically, find vulnerabilities, and validate them through actual proof-of-concepts. Why it matters: The biggest problem with traditional security testing is that it doesn't keep up with development speed. Strix solves this by integrating directly into your workflow: ↳ Run it in CI/CD to catch vulnerabilities before production ↳ Get real proof-of-concepts, not false positives from static analysis ↳ Test everything: injection attacks, access control, business logic flaws The best part? You don't need to be a security expert. Strix includes a complete hacker toolkit - HTTP proxy, browser automation, and Python runtime for exploit development. It's like having a security team that works at the speed of your CI/CD pipeline. The best part is that the tool runs locally in Docker containers, so your code never leaves your environment. Getting started is simple: - pipx install strix-agent - Point it at your codebase (app, repo, or directory) Everything is 100% open-source! I've shared link to the GitHub repo in the replies!
Steve Sperandeo 🇨🇦 retweeted
When I was a kid, bedtime was 9 pm. I couldn't wait to be a grownup so I could go to bed anytime I wanted. Turns out that is 9 pm.
Steve Sperandeo 🇨🇦 retweeted
The recently released DeepSeek-OCR paper has huge implication for AI memory, long‐context problems and token budgets. It frames the OCR model not only as a document‐reading tool but as an experiment in how models can “remember” more by storing data as images rather than text tokens. With this paper, DeepSeek really found a new way to store long context by turning text into images and reading them with optical character recognition, so the model keeps more detail while spending fewer tokens. DeepSeek's technique packs the running conversation or documents into visual tokens made from page images, which are 2D patches that often cover far more content per token than plain text pieces. The system can keep a stack of these page images as the conversation history, then call optical character recognition only when it needs exact words or quotes. Because layout is preserved in the image, things like tables, code blocks, and headings stay in place, which helps the model anchor references and reduces misreads that come from flattened text streams. The model adds tiered compression, so fresh and important pages are stored at higher resolution while older pages are downsampled into fewer patches that still retain gist for later recovery. That tiering acts like a soft memory fade where the budget prefers recent or flagged items but does not fully discard older context, which makes retrieval cheaper without a hard cutoff. Researchers who reviewed it point out that text tokens can be wasteful for long passages, and that image patches may be a better fit for storing large slabs of running context. On the compute side, attention cost depends on sequence length, so swapping thousands of text tokens for hundreds of image patches can lower per step work across layers. There is a latency tradeoff because pulling exact lines may require an optical character recognition pass, but the gain is that most of the time the model reasons over compact visual embeddings instead of huge text sequences. DeepSeek also reports that the pipeline can generate synthetic supervision at scale by producing rendered pages and labels, with throughput around 200,000 pages per day on 1 GPU. The method will not magically fix all forgetting because it still tends to favor what arrived most recently, but it gives the system a cheaper way to keep older material within reach instead of truncating it. For agent workloads this is appealing, since a planning bot can stash logs, instructions, and tool feedback as compact pages and then recall them hours later without blowing the token window. Compared with vector databases and retrieval augmented generation, this keeps more memory inside the model context itself, which reduces glue code and avoids embedding drift between external stores and the core model. --- technologyreview .com/2025/10/29/1126932/deepseek-ocr-visual-compression
4
25
3
135
Steve Sperandeo 🇨🇦 retweeted
🚨 This might be the biggest leap in AI agents since ReAct. Researchers just dropped DeepAgent a reasoning model that can think, discover tools, and act completely on its own. No pre-scripted workflows. No fixed tool lists. Just pure autonomous reasoning. It introduces something wild called Memory Folding the agent literally “compresses” its past thoughts into structured episodic, working, and tool memories… like a digital brain taking a breath before thinking again. They also built a new RL method called ToolPO, which rewards the agent not just for finishing tasks, but for how it used tools along the way. The results? DeepAgent beats GPT-4-level agents on almost every benchmark WebShop, ALFWorld, GAIA even with open-set tools it’s never seen. It’s the first real step toward general reasoning agents that can operate like humans remembering, adapting, and learning how to think. The agent era just leveled up.
46
218
16
1,012
Steve Sperandeo 🇨🇦 retweeted
Your default cognitive environment is hostile to deep thought. If you consume a minute-by-minute feed optimized for engagement, you are training your mind to be reactive, shallow, short-term. The most potent way to think about the world is the opposite: analytical, long-term, and grounded in first principles. You won't develop it on social media. Maybe try books.
73
206
18
1,664
Steve Sperandeo 🇨🇦 retweeted
Cheat sheet for the Claude Code assistant
5
55
613
Steve Sperandeo 🇨🇦 retweeted
Japanese scientists have created a hydrogel that reverts cancer cells back to cancer stem cells in 24 hours
Steve Sperandeo 🇨🇦 retweeted
New work on Rethinking Thinking Tokens: LLMs as Improvement Operators: arxiv.org/abs/2510.01123 Reasoning training encourages LLMs to produce long chains of thought (CoT), improving accuracy via self-checking but increasing context length, compute cost, and latency. This work studies whether frontier models can achieve better trade-offs, higher accuracy with lower cost. This work develops a simple yet effective Parallel-Distill-Refine (PDR) procedure: Generate diverse drafts in parallel, Distill them into a compact textual workspace, and Refine conditioned on this workspace. This decouples context length from total token count, allowing control over compute via parallelism. PDR yields higher accuracy than long CoT at lower latency. Training an 8B model with RL to align with PDR further shifts the Pareto frontier. On math benchmarks, PDR achieves +11% (AIME 2024) and +9% (AIME 2025) over single-pass baselines. With Lovish Madaan, Aniket Didolkar, Suchin Gururangan, John Quan, Ruan Silva, Manzil Zaheer, Sanjeev Arora, and Anirudh Goyal.
Steve Sperandeo 🇨🇦 retweeted
Just spent 3 weeks learning Claude Code inside-out. Here's the full roadmap nobody's talking about: BEGINNER (Week 1) claude(.)md → Tell Claude your project rules once, never repeat yourself Essential commands → claude code, claude chat, claude status Bash mode → Let Claude run terminal commands (game changer) Screenshot debugging → Snap it, send it, fixed TDD → Write the test, Claude writes the code Message queue → Stack up tasks, walk away, come back to done INTERMEDIATE (Week 2) Planning modes → Add "think" or "plan" so Claude doesn't rush Research mode → "Write the docs" while you build Changelogs → Auto-generated from commits GitHub Actions → Claude reviews your PRs automatically PM mindset → Tell Claude WHAT you want, not HOW to build it ADVANCED (Week 3) Parallel planning → Claude explores 3 solutions at once Multiple instances → Run 2 + Claudes with Git worktrees Custom commands → Build your own shortcuts Specialized subagents → One Claude for tests, one for APIs, etc. MCP servers → Connect Claude directly to your database
Steve Sperandeo 🇨🇦 retweeted
🔥 Gen AI has started creating solid ROI for enterprises. ~ A solid new Wharton study. 🧭 Gen AI in the enterprise has shifted from pilots to everyday use, with 82% using it at least weekly, 46% using it daily, and most leaders now measuring outcomes with 72% tracking return on investment and 74% already seeing positive returns. The study is a year-3, repeated survey of ~800 senior U.S. decision-makers in large companies, fielded in June-25 to July-25, so the numbers reflect real operations, not hype. Returns are showing up first where work is digital and process heavy, with Tech/Telecom at 88% positive ROI, Banking/Finance and Professional Services ~83%, Manufacturing 75%, and Retail 54%, while negative ROI is rare at <7%. On tools, ChatGPT sits at 67% organizational usage, Copilot at 58%, and Gemini at 49%, and the overwhelming majority of subscriptions are employer paid rather than employee expensed. Teams are standardizing on repeatable work, where data analysis (73%), document or meeting summarization (70%), document editing or writing (68%), presentation or report creation (68%), and idea generation (66%) are now common parts of the workday. Specialized use is rising by function, with code generation in IT (~72%), recruiting and onboarding in HR (~72%), and contract generation in Legal (56%) becoming normal rather than novel. --- Budget levels are large, with about 2/3 of enterprises investing $5M+ in Gen AI and Tier 1 firms likeliest to spend $20M+, which lines up with broader rollout and integration work. Looking forward, 88% expect budgets to rise in the next 12 months, and 62% expect increases >10%, while 87% believe their programs will deliver positive ROI within 2–3 years. Spending is becoming more disciplined, since 11% say they are cutting elsewhere to fund Gen AI, often trimming legacy IT or outside services as they double down on proven projects. Access is opening up while guardrails tighten, with ~70% allowing all employees to use Gen AI. Laggards remain about 16% of decision-makers, often in Retail and Manufacturing, and they cite tighter restrictions, budget pressure, slow-adopting cultures, and lower trust, which leaves them at risk as peers lock in gains. 🧵 Read on 👇
Steve Sperandeo 🇨🇦 retweeted
An engineer at @Blocks (formerly Square) has an AI agent watching his screen all day. The engineer will discuss a feature with a colleague on Slack. A few hours later, the agent has already built the feature and opened a PR. This isn't some distant AI future. It's happening now, thanks to the work Block has done building their own internal (and open-source) AI agent called "Goose." I sat down with Dhanji Prasanna, CTO at Block, to understand how they're achieving what most companies are still trying to figure out. Engineers using Goose report saving 8-10 hours per week. Across the entire company—including support, legal, and risk teams—they're saving 20-25% of manual hours, which equates to over 100,000 hours per week (!!!). The most surprising finding: non-technical teams are the ones seeing the most productivity gains. Their enterprise risk management team built an entire self-service system, compressing weeks of work into hours. No waiting for Q2 roadmaps or internal apps teams. Dhanji walked me through their full transformation—from convincing Jack Dorsey with an "AI manifesto" to reorganizing from GMs to functional structure to shipping Goose as open source. The whole conversation is live now. Link in comments.
My biggest takeaways from Dhanji Prasanna, CTO of @Blocks: 1. Block’s internal AI agent "Goose" is saving employees on average 8 to 10 hours per week. The company built an open-source tool called Goose that handles tasks from organizing files to writing code. Across the entire company, they’re seeing roughly 20% to 25% of manual work hours saved, and that number keeps climbing. 2. Non-technical teams are getting the biggest productivity boost from AI, not engineers. People in legal, risk management, and operations are now building their own software tools that previously would have required months on an engineering team’s roadmap. What used to take weeks now takes hours, and employees do it themselves without waiting. 3. Changing organizational structure unlocked more productivity than any AI tool. To transform into a truly “technology driven” company, Block reorganized from separate business units (each with their own GM and engineering teams) to a single functional structure where all engineers report to one leader. This “boring” change enabled a unified technology strategy and drove more acceleration than any AI tool. 4. Code quality has almost nothing to do with product success. YouTube became one of Google’s most successful products despite storing videos as blobs in a MySQL database with a slow Python stack. Meanwhile, Google Video had superior technology with more formats and higher resolution but failed completely. The lesson: Focus on solving real problems for people, not on perfect code. 5. AI enables teams to explore multiple paths simultaneously instead of choosing one up front. Previously, limited resources meant teams had to pick their best guess for an experiment. Now AI can build multiple different approaches overnight, allowing teams to compare five or six options and throw away entire features if they don’t feel right—a practice that was unthinkable before. 6. Most successful products start as tiny experiments, not big initiatives. Cash App began as a hack-week idea. Goose started as one engineer’s side project. Block’s Bitcoin product came from a three-person hackathon team. In contrast, Google Wave had 70 to 80 engineers before having real users and failed. Small experiments that prove value beat large up-front investments. 7. Leaders must use AI tools daily to drive real organizational adoption. Block’s CEO Jack Dorsey, the CTO, and the entire executive team use Goose every single day. This hands-on experience teaches them how workflows actually change and drives authentic adoption throughout the organization far more than reading articles or attending conferences about AI. 8. AI excels at new projects but struggles with complex legacy systems. Teams building new applications or working on greenfield platforms see aggressive productivity gains. But in existing codebases with years of accumulated complexity, the gains aren’t there yet. Deploy AI where it works best rather than everywhere at once. 9. Giving away valuable technology for free can be a winning strategy. Block open-sourced Goose even though it could have been a standalone billion-dollar business. Even their competitors actively use it. The philosophy: build things that benefit everyone and outlast your own company. This commitment to open-source technology attracts talent and builds industry goodwill while advancing everyone’s capabilities. 10. Purpose should drive your technology choices, not the other way around. Rather than chasing every AI trend or trying to be at the forefront of every technology, identify what truly matters to your company and customers. Block stays focused on economic empowerment, which guides their technology decisions and keeps them from getting distracted by every new advancement. Listen now 👇 • YouTube: piped.video/JMeXWVw0r3E • Spotify: open.spotify.com/episode/1ZL… • Apple: podcasts.apple.com/us/podcas… Thank you to our wonderful sponsors for supporting the podcast: 🏆 @wearesinch — Build messaging, email, and calling into your product: sinch.com/lenny 🏆 @Figma Make — A prompt-to-code tool for making ideas real: figma.com/lenny/ 🏆 @Persona_IDV — A global leader in digital identity verification: A
Steve Sperandeo 🇨🇦 retweeted
🧵 LoRA vs full fine-tuning: same performance ≠ same solution. Our NeurIPS ‘25 paper 🎉shows that LoRA and full fine-tuning, even when equally well fit, learn structurally different solutions and that LoRA forgets less and can be made even better (lesser forgetting) by a simple intervention! Read on for behavioral differences (forgetting, continual learning) and other analysis! Paper: arxiv.org/pdf/2410.21228 (1/7)
18
246
16
1,540