Security Researcher at @theori_io. Flag capturer at @PlaidCTF. Cryptography enjoyer.

Seattle area
Joined September 2013
@theori_io's AIxCC CRS has already found dozens of 0day vulnerabilities, and we've barely scratched the surface! The best part: it's open source, so there's no secrets to hide (at least in the AIxCC version 😉)! So, how does our CRS actually find these 0days? 🧵
5
31
1
151
Tim Becker retweeted
the best movie on context engineering
59
151
25
1,739
Tim Becker retweeted
We're investing heavily in building LLM-powered security tooling. Our existing tools work shockingly well, and we're confident we can keep pushing the SOTA. If you're interested in joining us, please reach out!
We are hiring awesome Python developers that want to build cybersecurity tools that leverage LLMs. Contact me if interested.
3
20
Tim Becker retweeted
We are hiring awesome Python developers that want to build cybersecurity tools that leverage LLMs. Contact me if interested.
@tylerni7 @ @HacksInTaiwan 2025 Great talk 👍 Photographed by @byronwai
2
21
Tim Becker retweeted
To add further evidence that MoE scheduling is the prime suspect here: the logprobs values vary discretely. Sampling 100 times generally returns < 10 distinct logprobs values.
1
4
Tim Becker retweeted
The previous thread glossed over how our LLM Agents actually work. The truth is, it took us a long time to figure out how to get reliable and impressive results from agents. By the end, we learned general strategies to build effective LLM agents, which we're now sharing. 🧵
Replying to @tjbecker_
This high confidence allows us to run our exploiter and patcher agents on every vulnerability, often resulting in both a PoC and a Patch. We run multiple copies of each agent and cross-check the results against one-another.
3
15
63
Tim Becker retweeted
Replying to @tjbecker_
one nugget of wisdom I liked from your pod with ctfradio was around a fundamental reason why agents calling agents is important. I think the example was around POVwriter(?) not being allowed to read code, but can call agents to summarize code for it. Using just one agent and having it use tools to gather its own information will flood its context -- often worsening further action outcomes. It's better to have it spin off agents that do the tool uses (read lots of code, summarize) and hand that back to the original agent.
1
2
12
Strategy #4: Adapt to the Models Some models excel at precise instruction following; others need more flexibility to achieve a high-level goal. Also, some models struggle with tool-calling, but you can explore custom tool call formats or (ab)use the `tool_choice` API parameter
1
1
4
Strategy #3: Structure Complex Outputs Make sure your agent knows exactly what it needs to output, including the precise format of that output. Pro tip: you can ask them to output information which you don't plan to use, but that steers them towards certain ways of thinking!
1
1
3
Strategy #2: Curate the Toolset LLM agents repeatedly call tools until they reach their goal, so curating the toolset is crucial. The toolset should be as powerful, focused, and helpful as possible. Put up guardrails to prevent your agents from reaching known dead-ends!
1
1
3
Strategy #1: Decompose the task LLM agents excel at tasks generally requiring human intuition to solve, but they can't yet solve arbitrarily complex multi-step tasks. If the task can be solved in multiple parts, you should decompose it as a workflow of multiple agents.
1
1
5
Tim Becker retweeted
These days, when I see the results of bug hunting using AI, I truly feel glad that I retired early. Theori at aixcc: theori.io/blog/exploring-tra… Google big sleep: issuetracker.google.com/issu… Xbow: xbow.com/blog/top-1-how-xbow…
Tim Becker retweeted
@theori_io's AIxCC CRS has already found dozens of 0day vulnerabilities, and we've barely scratched the surface! The best part: it's open source, so there's no secrets to hide (at least in the AIxCC version 😉)! So, how does our CRS actually find these 0days? 🧵
5
31
1
151
Tim Becker retweeted
I’ve looked through the AIxCC repos. If you are going to get started and try to adapt for your use, I suggest looking at @trailofbits and @theori_io CRS first. And ofc anyone who takes my AI Agents for Cybersecurity class will get a deep dive on this along with my own agents!
7
21
1
154
Tim Becker retweeted
The results of AIxCC are truly impressive. I think offensive researchers will soon have to take on LLMs directly. By the way, I’m curious whether current LLM agents (like Big Sleep, @theori_io, @TeamAtlanta24) could find in-the-wild bugs that were discovered in the past — for example, in Mojo, V8, XNU, WebKit, or Linux.
1
4
46
Tim Becker retweeted
All of them have found *novel* 0days in popular projects, so I think I’m less interested in whether they can rediscover known in the wild bugs :)
1
1
7