VC by day @untappedvc, builder by night: @babyagi_, @pippinlovesyou @pixelbeastsnft. Build-in-public log: yohei.me

Seattle, WA
Joined April 2009
Pinned Tweet
announcing untapped capital fund II pre-seed, generalist, ~$250k checks
if you're already starting to fine-tune open source models for smaller tasks, using something like the @runanywhereai sdk can let you offload some of the inference to user devices, lowering cloud inference costs and latency but there's probably more fun/unique applications i'm not thinking of
ollama for mobile, picking up attention
7
6
30
ollama for mobile, picking up attention
too many cool people, building too many cool things 😵‍💫
14
10
64
just noticed "mean vc" is at 25k+ chats! (if you're looking to refine you're pitch, give it a go)
4
1
23
now I really want a good ai first email client with strong CRM integration (that’s not a sales tool)
this weekend, i vibe coded a prototype email client that i want... - for each email, looks up contacts details + notes in attio, then org details, looks at lists to determine if portfolio, LP, etc. - summarizes this context and creates content relevance score - uses this content to categorize and tag emails - uses this to decide if it's a quick reply, research and reply, or take action and reply - if research/actions, suggests steps - drafts email based on all of this - i can create custom rules/prompts based on categories, which get auto updated when i edit a suggested action item or draft far from ready, but cool to see it starting to work with real data
10
2
45
original prompt for this project, in case you're curious (build with @replit agent)
Replying to @jenny____r
i use @replit, but probably similar to how you use lovable (just chat)
1
1
10
tl;dr on my agent thread from this weekend by @grok
random rant on where we are with ai agents:
1
7
28
this weekend, i vibe coded a prototype email client that i want... - for each email, looks up contacts details + notes in attio, then org details, looks at lists to determine if portfolio, LP, etc. - summarizes this context and creates content relevance score - uses this content to categorize and tag emails - uses this to decide if it's a quick reply, research and reply, or take action and reply - if research/actions, suggests steps - drafts email based on all of this - i can create custom rules/prompts based on categories, which get auto updated when i edit a suggested action item or draft far from ready, but cool to see it starting to work with real data
16
9
1
136
ai can draft legal docs but you still want a lawyer to review it. i’m a fan (and investor) of services like @DocDraftai that combines the best of both into a single elegant service :)
Just noticed the image @yoheinakajima uses for @DocDraftai on his portfolio page. Perfect encapsulation of DocDraft!
4
21
what else am i missing?
5
7
agents in production come in various flavors: as internal tools, as a stand alone product that combines various tools, and baked in as a feature to a core tool. they can be generic or specialized. chat, voice, and background agents seem to be most common UI interface for triggering agentic flows.
2
6
if we had today’s model quality with infinite content window (no degradation in quality), infinite compute, infinite storage, browser access, and a payment method, a single LLM loop is probably enough to get a lot done the point of the pointless point above (nothing is infinite) is that agent orchestration is largely about managing limitations by architecting ways to offload work from the LLM through structure and code.
1
10
*earlier i mentioned handing of tasks to workflow agents, which start to make sense when your agent would benefit from having no workflow agents as tools (vs figuring out out a known task list each time) or when your system is complicated enough that specialized agents w specialized context and tools perform better. or if you’re leveraging agents built by other ppl (one pattern i’ve started to seen here is natural language api endpoints for easier agent collaboration).
1
6
reflection: we have agent monitoring tools that cover LLM/api costs, observation, but assigning success/failure is a challenge - one area where coding agents have a leg up on others is a deterministic way to notice failures (through testing of the code). for many other agentic tasks, we’re still figuring out the right way to collect human input to improve future output. afaik, reflection today is human-in-the-loop, where the feedback is largely being fed to human devs to improve the agent, but the unlock comes when we figure out how to turn reflection into self-improvement - where the agent takes insights from failures in task list generation and task execution to do better next time. basically, the reflection needs to turn into well organizing context that can be pulled into prompts when and only when relevant. this evolves into fine-tuning pieces of the agent, and then agentic RL environments - still feels pretty early here
5
2
13
context: imagine being a new employee at a company. you learn a lot during onboarding (and the better the onboarding the more effective you are out of the gate), and then there’s learning on the job which breaks down into learning from the org’s experience (“this is how we do things”) and learning from own experience - former more predominant in large orgs. context management is similar. there’s layers of context like meta (user/company), project/dept specific, task specific, tool specific, etc. we’ve evolved from simple system prompts to hybrid RAG strategies (vector, keyword, graph), but beyond having the data/context, we need guidance on when and how to retrieve context, which we see early versions of today - but lots of room for improvement. this is not merely a technical problem, but also a business issue - as you basically need to create an onboarding doc that covers every scenario you expect. as projects get more complicated, it takes more thoughtfulness to correctly prune the context so only relevant information gets included in prompt, while minimizing irrelevant context.
3
10
task execution: tasks are usually api calls (requiring auth and understanding of how to use the api, and underlying data structure - which can be unique like in a crm or db with custom tables/columns), LLM reasoning (eg summarize), a combination, and even workflow agents*. a research agent is really just web search and summarization in a loop. coding agents are CRUD on your code base, and maybe web search for learning APIs. auth and basic api access feels solved (MCPs fit here), but i’d like to see more around tool specific context (ask user, but also analyze upon initial connection, dig in to existing data to understand how the tool is used, how the data is structured, what scenarios/projects we use the tool for.), errors/reflection/feedback needs to turn into organized learnings that get fed back in as context when relevant. the same tools can be used for different purposes and in different ways between orgs and we need to capture/document this somehow to execute tasks well.
2
9
task planning: llm’s reasoning capabilities work fine for simple task lists that require no private context (like deep research, just a series of web searches while summarizing). if you want to research a lot of entities, deep research doesn’t work as well because the task list management is relatively basic. spreadsheet based AI tools work for better for researching many entities because you’re effectively offloading the task management to the spreadsheet, as passing long task lists between prompts don’t work here. task management in coding agents works with simple problems, simple code, or when you’re starting from scratch. once you go into more complex pre-existing projects, they are less reliable - and devs increase reliability by documenting how their code works and is organized (.md files) which allows the agent to build better informed task lists. complex code requires more documents and eventually dynamically pulling only relevant context from those documents. a lot of people/businesses have strong undocumented opinions on the correct order/approach/tools to tackle a project, and we need more approaches to documenting this is upfront and on the fly. another reason coding and web based research agents work well is that they all use the same set of tools so no need to “learn” how to use those tools (more on this next).
1
9
dynamic agents work well when they can (1) create and track a good plan and (2) execute tasks correctly, while (3) finding the right context to feed into each step (both planning and each task). finally, it needs to (4) reflect along the way (either with or without human input) so it can adjust the plan appropriately, and also improve the way it executes failed or poor performing tasks.
1
3
20