Yohei · Nov 3, 2025 · 12:06 AM UTC

Yohei

Yohei

@yoheinakajima

Nov 3

random rant on where we are with ai agents:

117

Yohei · Nov 3, 2025 · 12:06 AM UTC

Yohei

@yoheinakajima

Nov 3

some don’t call them agents, but “workflow agents” with deterministic flows are everywhere and work. anybody can build simple workflow agents, even starting w no code tools like Zapier and n8n. complex workflow agents require much more thought to build reliably and efficiently. a complex workflow for a common and valuable use case, with relevant integrations baked in, can stand alone as a business, and also a great GTM to expand later into other workflows or more autonomous work.

Yohei · Nov 3, 2025 · 12:06 AM UTC

Yohei

@yoheinakajima

Nov 3

more dynamic/autonomous agents are starting to work and helpful for research (especially if web based) and coding. less reliable once you start adding more data sources (eg APIs). read-only agents feel safe and easy to test but letting autonomous agents take action (write) is scary. (random idea on this: would be cool if tools like a CRM let you “fork” a dev mirror and run automation experiments that you can roll back or merge back in.)

Yohei · Nov 3, 2025 · 12:06 AM UTC

Yohei

@yoheinakajima

Nov 3

dynamic agents work well when they can (1) create and track a good plan and (2) execute tasks correctly, while (3) finding the right context to feed into each step (both planning and each task). finally, it needs to (4) reflect along the way (either with or without human input) so it can adjust the plan appropriately, and also improve the way it executes failed or poor performing tasks.

Yohei · Nov 3, 2025 · 12:06 AM UTC

Yohei · Nov 3, 2025 · 12:06 AM UTC

Yohei

@yoheinakajima

Nov 3

task planning: llm’s reasoning capabilities work fine for simple task lists that require no private context (like deep research, just a series of web searches while summarizing). if you want to research a lot of entities, deep research doesn’t work as well because the task list management is relatively basic. spreadsheet based AI tools work for better for researching many entities because you’re effectively offloading the task management to the spreadsheet, as passing long task lists between prompts don’t work here. task management in coding agents works with simple problems, simple code, or when you’re starting from scratch. once you go into more complex pre-existing projects, they are less reliable - and devs increase reliability by documenting how their code works and is organized (.md files) which allows the agent to build better informed task lists. complex code requires more documents and eventually dynamically pulling only relevant context from those documents. a lot of people/businesses have strong undocumented opinions on the correct order/approach/tools to tackle a project, and we need more approaches to documenting this is upfront and on the fly. another reason coding and web based research agents work well is that they all use the same set of tools so no need to “learn” how to use those tools (more on this next).

Nov 3, 2025 · 12:06 AM UTC

Yohei · Nov 3, 2025 · 12:06 AM UTC

Yohei

@yoheinakajima

Nov 3

task execution: tasks are usually api calls (requiring auth and understanding of how to use the api, and underlying data structure - which can be unique like in a crm or db with custom tables/columns), LLM reasoning (eg summarize), a combination, and even workflow agents*. a research agent is really just web search and summarization in a loop. coding agents are CRUD on your code base, and maybe web search for learning APIs. auth and basic api access feels solved (MCPs fit here), but i’d like to see more around tool specific context (ask user, but also analyze upon initial connection, dig in to existing data to understand how the tool is used, how the data is structured, what scenarios/projects we use the tool for.), errors/reflection/feedback needs to turn into organized learnings that get fed back in as context when relevant. the same tools can be used for different purposes and in different ways between orgs and we need to capture/document this somehow to execute tasks well.

Yohei · Nov 3, 2025 · 12:06 AM UTC

Yohei

@yoheinakajima

Nov 3

context: imagine being a new employee at a company. you learn a lot during onboarding (and the better the onboarding the more effective you are out of the gate), and then there’s learning on the job which breaks down into learning from the org’s experience (“this is how we do things”) and learning from own experience - former more predominant in large orgs. context management is similar. there’s layers of context like meta (user/company), project/dept specific, task specific, tool specific, etc. we’ve evolved from simple system prompts to hybrid RAG strategies (vector, keyword, graph), but beyond having the data/context, we need guidance on when and how to retrieve context, which we see early versions of today - but lots of room for improvement. this is not merely a technical problem, but also a business issue - as you basically need to create an onboarding doc that covers every scenario you expect. as projects get more complicated, it takes more thoughtfulness to correctly prune the context so only relevant information gets included in prompt, while minimizing irrelevant context.

Yohei · Nov 3, 2025 · 12:06 AM UTC

Yohei

@yoheinakajima

Nov 3

reflection: we have agent monitoring tools that cover LLM/api costs, observation, but assigning success/failure is a challenge - one area where coding agents have a leg up on others is a deterministic way to notice failures (through testing of the code). for many other agentic tasks, we’re still figuring out the right way to collect human input to improve future output. afaik, reflection today is human-in-the-loop, where the feedback is largely being fed to human devs to improve the agent, but the unlock comes when we figure out how to turn reflection into self-improvement - where the agent takes insights from failures in task list generation and task execution to do better next time. basically, the reflection needs to turn into well organizing context that can be pulled into prompts when and only when relevant. this evolves into fine-tuning pieces of the agent, and then agentic RL environments - still feels pretty early here

Yohei · Nov 3, 2025 · 12:06 AM UTC

Yohei

@yoheinakajima

Nov 3

*earlier i mentioned handing of tasks to workflow agents, which start to make sense when your agent would benefit from having no workflow agents as tools (vs figuring out out a known task list each time) or when your system is complicated enough that specialized agents w specialized context and tools perform better. or if you’re leveraging agents built by other ppl (one pattern i’ve started to seen here is natural language api endpoints for easier agent collaboration).

Yohei · Nov 3, 2025 · 12:06 AM UTC

Yohei

@yoheinakajima

Nov 3

if we had today’s model quality with infinite content window (no degradation in quality), infinite compute, infinite storage, browser access, and a payment method, a single LLM loop is probably enough to get a lot done the point of the pointless point above (nothing is infinite) is that agent orchestration is largely about managing limitations by architecting ways to offload work from the LLM through structure and code.

Yohei · Nov 3, 2025 · 12:06 AM UTC

Yohei

@yoheinakajima

Nov 3

agents in production come in various flavors: as internal tools, as a stand alone product that combines various tools, and baked in as a feature to a core tool. they can be generic or specialized. chat, voice, and background agents seem to be most common UI interface for triggering agentic flows.

Yohei · Nov 3, 2025 · 12:06 AM UTC

Yohei

@yoheinakajima

Nov 3

what else am i missing?