Galileo · Oct 24, 2025 · 4:34 PM UTC

Galileo

Galileo

@rungalileo

Oct 24

Agent benchmarks are great, but you need to actually understand your end user’s experience. We've released four new agentic metrics, enabling you to track how your agents behave in real-world scenarios and how users experience flow, efficiency, intent change, and conversation quality, all available out of the box. Agent Flow: Measure the correctness and coherence of an agentic trajectory by validating it against user-specified natural language tests. When building multi-step agents, this metric helps you catch deviations from expected behavior patterns. Agent Efficiency: Assess the efficiency of your agentic workflows. An agentic session is considered efficient or optimal when the agent provides a precise answer or resolution to every user ask, with an efficient path. This metric helps you identify unnecessary tool calls, redundant questions, and bloated workflows. Conversation Quality: Evaluate the overall user experience across multi-turn conversations. Beyond accuracy, it measures whether interactions leave users satisfied or frustrated, critical for customer-facing applications. Intent Change: Detect when users shift their goals mid-conversation. These shifts often indicate gaps in your agent's ability to handle the initial request, providing clear signals for improvement. Each metric is designed for production agents, providing granular visibility into behavior that traditional evals miss, and can be customized to your specific application, further extending our agent evals capabilities (which you can also access via our new MCP!). These are available to use for free today in Galileo, learn more about the new metrics below 👇

Conor Bronsdon · Oct 23, 2025 · 5:29 PM UTC

Galileo retweeted

Conor Bronsdon @ConorBronsdon

Oct 23

Crazy seeing @rungalileo buses in the wild with @SFMTA_Muni

Harshith · Oct 23, 2025 · 5:19 PM UTC

Galileo retweeted

Harshith

@harshithio

Oct 23

Just wrapped up the @daytonaio HackSprint in San Francisco - an in-person hackathon focused on building AI agents with sharp reasoning and independent decision-making. Had the privilege to hack with @AnelyaGrant, our Co-Founder & CPO at @getjustpaid, where we vibe coded a design layout for finance professionals. What made this stand out: → Location: 660 Market St, San Francisco → Partners: @AnthropicAI, @rungalileo, @browser_use, and @WorkOS → Focus: Designing agents that demonstrate originality, technical strength, and real-world impact → Challenge: Safe integration with industry-relevant tools The shift toward Agentic AI continues to reshape how we think about automation. We're moving beyond simple task execution to systems that can reason, decide, and operate independently. This is the kind of event that reminds you why SF remains the epicenter for AI innovation. Big thanks to the Daytona team for putting this together. #AI #AgenticAI #Hackathon #SanFrancisco

Nachiket Paranjape · Oct 22, 2025 · 5:10 PM UTC

Galileo retweeted

Nachiket Paranjape @nmparanjape

Oct 22

Happy to share what we’ve been up to for the past few weeks! Develop agents at rapid pace with Galileo’s MCP right in your favorite IDE.

Galileo

@rungalileo

Oct 22

What if you could get insights into why your agent failed, and apply the fix without ever leaving your IDE? We're launching our Agent Evals MCP to make this a reality. 💪 With one config file, you can now access our evaluation and observability capabilities directly in Cursor or VS Code. No context switching. No manual copy-paste. Just eval-powered insights where you actually build. Our new MCP server enables: 🔍 Instant root cause analysis: Get logstream insights that pinpoint precisely where and why agents deviate 📊 Synthetic dataset generation: Create test data directly in your IDE with natural language requests ✍️ Prompt template management: Set up and validate templates without leaving your development environment ⚡ Real-time integration guidance: Your AI assistant can now suggest and apply Galileo instrumentation directly to your codebase Agent reliability should start where you code. Get started with the docs below 👇

Galileo · Oct 22, 2025 · 4:38 PM UTC

Galileo

@rungalileo

Oct 22

Get started today, and read the docs here: v2docs.galileo.ai/getting-st…

Galileo MCP Server - Galileo

Learn how to integrate Galileo's Model Context Protocol (MCP) server with AI-enabled IDEs like Cursor and VS Code

v2docs.galileo.ai

Galileo · Oct 22, 2025 · 4:38 PM UTC

Galileo

@rungalileo

Oct 22

Galileo · Oct 21, 2025 · 4:39 PM UTC

Galileo

@rungalileo

Oct 21

👀 🚌 Spotted in SF! We just launched our new bus ads across San Francisco because if you want your AI to work reliably in production, you need thorough evaluations, monitoring, and guardrails 🔒 If you’re in the Bay Area and want a chance at winning some Galileo swag, now’s your chance. All you have to do is: 👀 Spot one of our buses in the city 📸 Take a photo 💻 Post it on LinkedIn or X ✅ Tag @Galileo Our buses will be circulating for the next few weeks, and we’ll announce the winner on Nov. 12th 📅

Galileo · Oct 20, 2025 · 5:36 PM UTC

Galileo

@rungalileo

Oct 20

On Saturday, the inaugural @daytonaio HackSprint brought San Francisco's AI builder community together, and what the community built in just six hours was incredible 💪 Our Solutions Architect, Vyoma Gajjar, was onsite where she awarded the following builders for the best use of Galileo: 1️⃣ BioScout by Mahtabin Rodela – Connects biotech investors with emerging research and patents, making discovery faster and more accessible. 2️⃣ Smart Treasury Agent by @nicocapetillo – An AI copilot for CFOs that simulates treasury strategies in real time, bringing sophisticated financial modeling to more companies. 3️⃣ Peazy Trainer by Kushal Murthy & Komala Chenna – A voice-guided AI trainer that teaches software through live, hands-on sandboxes. Each team excelled using evaluations and observability to understand agent behavior, catch errors early, and ship reliable agents. Huge congrats to all three winners, and thank you to everyone who participated. The quality of work, the collaboration, and the energy in the room reminded us why we love this community. Thanks to @JukicVedran and the Daytona team for creating a space where builders could do their best work, @WorkOS for hosting us in their amazing office, and the @AnthropicAI team for sponsoring as well 🤝 We can't wait to see what you build next, and we’re excited to participate in more HackSprints soon 👀

Daytona · Oct 19, 2025 · 2:09 PM UTC

Galileo retweeted

Daytona

@daytonaio

Oct 19

🚀Yesterday we hosted off the first-ever Daytona HackSprint in SF, powered by @AnthropicAI, @rungalileo, @browser_use & @WorkOS at the amazing WorkOS Office! 🏆So many great projects — here are the winners ⬇️ 🥇 A/B GPT – AI that finds & fixes UX issues autonomously using Daytona + Browser Use 🥈 PolySandbox – Unified sandbox orchestrator for AI code evals across backends 🥉 QoalA – Automated website QA with Browser-Use + Daytona 🎖️ Best Use of Galileo 1️⃣ BioScout – AI scouting biotech patents 2️⃣ Peazy Trainer – Voice-led AI training in sandboxes 3️⃣ Smart Treasury Agent – AI copilot for corporate finance Huge congrats to all the winners, and a big thank you to everyone who participated, including the judges, volunteers, and partner representatives! 🚀

Daytona · Oct 18, 2025 · 6:51 PM UTC

Galileo retweeted

Daytona

@daytonaio

Oct 18

We just kicked off our first @daytonaio HackSprint at the beautiful @WorkOS office in SF! After demos from our amazing partners @rungalileo, @browser_use , @WorkOS & @AnthropicAI , the hacking has begun. We can’t wait to see what awesome projects our builders create today! 🚀

Galileo · Oct 17, 2025 · 5:47 PM UTC

Galileo

@rungalileo

Oct 17

📅 HackSprint SF with @daytonaio, @AnthropicAI, @browser_use, and @WorkOS is tomorrow! If you're in SF, join us for a day of hacking agents and get a chance to win prizes from the prize pool of over $40,000 💰 Register here: luma.com/bh7auv0t

Daytona HackSprint - SF, October '25 · Luma

🚀 Be part of the inaugural Daytona HackSprint! We’re bringing together San Francisco’s brightest AI builders for an intense one-day sprint to create projects…

luma.com

Galileo

@rungalileo

Oct 10

On October 18th, we're excited to partner with @daytonaio for their HackSprint in SF, alongside @AnthropicAI, @browser_use, @WorkOS, and more. We're bringing together San Francisco's brightest AI builders for an intense one-day sprint to create projects at the frontier of AI. The Challenge? Build AI agents with sharp reasoning, independent decision-making, and safe integration with industry-relevant tools. Participants will have six hours to build, and three minutes to present. Every participant gets $50 in Claude API credits from Anthropic and $100 Daytona credits, and the prize pool exceeds $40,000+ 👀 Our Solutions Architect, Vyoma Gajjar, will be judging alongside @TobinSouth (Head of MCP and Agents at WorkOS), @JukicVedran (Co-founder & CTO at Daytona), and other leaders in AI. Don’t miss this, register here: luma.com/bh7auv0t

Elastic · Oct 17, 2025 · 2:36 PM UTC

Galileo retweeted

Elastic

@elastic

Oct 17

What does it 𝘳𝘦𝘢𝘭𝘭𝘺 take to build an AI startup that lasts? Founders from @rungalileo, @JinaAI_, & @llama_index join @benchmark’s @chetanp live for an honest convo on scaling, funding & building AI products that actually deliver. Save your seat: go.es.io/42JX9l9

Cloudera · Oct 16, 2025 · 9:30 PM UTC

Galileo retweeted

Cloudera

@cloudera

Oct 16

We recently welcomed new members to our Enterprise AI Ecosystem: @ServiceNow, @rungalileo, @Pulse__AI, and Fundamental. In doing so, we're able to deliver complete, production-ready AI solutions to customers. Our Abhas Ricky says it best: "We’re only as good as our ecosystem and we take pride in that." Read more about this commitment in @CRN:

Galileo · Oct 15, 2025 · 3:25 PM UTC

Galileo

@rungalileo

Oct 15

Partnering with JPMorganChase has been a blast 💥 The impact of agentic applications within large financial services organizations will be massive, but for financial service agents to be successful, every interaction needs to be monitored, evaluated, and protected in real time to foster trust among users. Our platform is purpose-built for this with agent observability and runtime protection, which: ✅ Scales to billions of agent paths ✅ Stops errors before they reach users with millisecond latencies ✅ Can ingest millions of user queries a day Excited to share more soon 👀

Daytona · Oct 13, 2025 · 7:29 PM UTC

Galileo retweeted

Daytona

@daytonaio

Oct 13

Daytona’s October lineup 🔥 📍San Francisco 🌉 AI Builders — Oct 14 HackSprint — Oct 18 📍New York City 🗽 AI Builders — Oct 20 Thanks to @AnthropicAI, @datadoghq, @brexHQ , @rungalileo, @browser_use & @WorkOS for powering the builder community! 🚀 RSVP links for the events in comments ⬇️

Galileo · Oct 13, 2025 · 4:16 PM UTC

Galileo

@rungalileo

Oct 13

Watch the full conversation here: piped.video/zVBJ-40hZGw

Architecting AI Agents: The Shift from Models to Systems | Aishwarya...

Most AI agents are built backwards, starting with models instead of system architecture.@aishwaryasrinivasan Head of AI Developer Relations at Fireworks AI, ...

youtube.com

Galileo · Oct 13, 2025 · 4:16 PM UTC

Galileo

@rungalileo

Oct 13

Production-ready agents require complex coordination across different model types, memory systems, and tool calls. This is the shift from prompt engineering to context engineering. When you're building production AI systems, you're architecting systems that coordinate: → Multiple model sizes (small, mid-size, large) → Different training approaches (fine-tuned, distilled, proprietary, open-source) → Multi-modal capabilities across different models → Memory systems that persist and retrieve context → Tool calls that extend model capabilities Each component of your system needs the right context at the right time; otherwise, agents can hallucinate, miss critical cues, or fail unpredictably. @Aishwarya_Sri0 joined us on the Chain of Thought Podcast last week to break down the shift to architecting AI systems. Watch the full conversation with host @ConorBronsdon below 👇

Galileo · Oct 10, 2025 · 4:46 PM UTC

Galileo

@rungalileo

Oct 10

Galileo · Oct 9, 2025 · 4:15 PM UTC

Galileo

@rungalileo

Oct 9

Learn how to continuously improve your LangGraph multi-agent system here: galileo.ai/blog/evaluate-lan…

How to Continuously Improve Your LangGraph Multi-Agent System

A step by step process to continuously improve langgraph agents in production

galileo.ai

Galileo · Oct 9, 2025 · 4:15 PM UTC

Galileo

@rungalileo

Oct 9

Your multi-agent system hits production and encounters edge cases you never tested. 47 LLM calls, 12 tool invocations, 8 agent handoffs, but how do you know which one failed? Production-ready multi-agent systems need observability from day one. Our latest blog shows you how to build a LangGraph multi-agent system that routes queries to specialized agents. The architecture worked in testing, but in production, there was context loss between agent handoffs, redundant tool calls, and incorrect routing decisions. The solution? Systematic observability at every layer: 𝐓𝐫𝐚𝐜𝐤 𝐞𝐯𝐞𝐫𝐲 𝐝𝐞𝐜𝐢𝐬𝐢𝐨𝐧 𝐢𝐧 𝐫𝐞𝐚𝐥-𝐭𝐢𝐦𝐞: Observability tools show which agent handled each query, what tools they invoked, and where latency bottlenecks appear. 𝐃𝐞𝐛𝐮𝐠 𝐰𝐢𝐭𝐡 𝐚𝐜𝐭𝐮𝐚𝐥 𝐜𝐨𝐧𝐭𝐞𝐱𝐭: Our Insights Engine automatically surfaces failure patterns so you don’t have to dig through traces manually. 𝐈𝐦𝐩𝐫𝐨𝐯𝐞 𝐛𝐚𝐬𝐞𝐝 𝐨𝐧 𝐝𝐚𝐭𝐚: See which metrics offer clear performance targets for maintaining user satisfaction and operational excellence. Read our blog to learn the complete implementation and techniques you can apply to any multi-agent architecture 👇