Tim Becker · Aug 13, 2025 · 5:09 PM UTC

Tim Becker

Pinned Tweet

Tim Becker

@tjbecker_

Aug 13

@theori_io's AIxCC CRS has already found dozens of 0day vulnerabilities, and we've barely scratched the surface! The best part: it's open source, so there's no secrets to hide (at least in the AIxCC version 😉)! So, how does our CRS actually find these 0days? 🧵

151

Peter J. Liu · Oct 25, 2025 · 3:00 PM UTC

Tim Becker retweeted

Peter J. Liu

@peterjliu

Oct 25

the best movie on context engineering

151

1,739

Tim Becker · Oct 4, 2025 · 6:44 AM UTC

Tim Becker retweeted

Tim Becker

@tjbecker_

Oct 4

We're investing heavily in building LLM-powered security tooling. Our existing tools work shockingly well, and we're confident we can keep pushing the SOTA. If you're interested in joining us, please reach out!

Andrew Wesie @andrewwesie

Oct 3

We are hiring awesome Python developers that want to build cybersecurity tools that leverage LLMs. Contact me if interested.

Andrew Wesie · Oct 3, 2025 · 4:11 AM UTC

Tim Becker retweeted

Andrew Wesie @andrewwesie

Oct 3

We are hiring awesome Python developers that want to build cybersecurity tools that leverage LLMs. Contact me if interested.

Mr. Anthony 安東尼 · Aug 16, 2025 · 3:00 AM UTC

Tim Becker retweeted

Mr. Anthony 安東尼

@darkfloyd1014

Aug 16

@tylerni7 @ @HacksInTaiwan 2025 Great talk 👍 Photographed by @byronwai

Tim Becker · Aug 13, 2025 · 11:39 PM UTC

Tim Becker retweeted

Tim Becker

@tjbecker_

Aug 13

To add further evidence that MoE scheduling is the prime suspect here: the logprobs values vary discretely. Sampling 100 times generally returns < 10 distinct logprobs values.

Tim Becker · Aug 14, 2025 · 7:51 PM UTC

Tim Becker retweeted

Tim Becker

@tjbecker_

Aug 14

The previous thread glossed over how our LLM Agents actually work. The truth is, it took us a long time to figure out how to get reliable and impressive results from agents. By the end, we learned general strategies to build effective LLM agents, which we're now sharing. 🧵

Tim Becker

@tjbecker_

Aug 13

Replying to @tjbecker_

This high confidence allows us to run our exploiter and patcher agents on every vulnerability, often resulting in both a PoC and a Patch. We run multiple copies of each agent and cross-check the results against one-another.

plotchy🔅 · Aug 14, 2025 · 8:37 PM UTC

Tim Becker retweeted

plotchy🔅

@plotchy

Aug 14

Replying to @tjbecker_

one nugget of wisdom I liked from your pod with ctfradio was around a fundamental reason why agents calling agents is important. I think the example was around POVwriter(?) not being allowed to read code, but can call agents to summarize code for it. Using just one agent and having it use tools to gather its own information will flood its context -- often worsening further action outcomes. It's better to have it spin off agents that do the tool uses (read lots of code, summarize) and hand that back to the original agent.

Tim Becker · Aug 14, 2025 · 7:51 PM UTC

Tim Becker

@tjbecker_

Aug 14

This was just a quick summary! For many more details, including specific examples of each strategy in our CRS, check out our blog post: theori.io/blog/building-effe…

Building Effective LLM Agents | AI Cyber Challenge - Theori BLOG

How we learned to build effective LLM agents for hacking at DARPA's AI Cyber Challenge (AIxCC) | AI for Security, AIxCC

theori.io

Tim Becker · Aug 14, 2025 · 7:51 PM UTC

Tim Becker

@tjbecker_

Aug 14

Strategy #4: Adapt to the Models Some models excel at precise instruction following; others need more flexibility to achieve a high-level goal. Also, some models struggle with tool-calling, but you can explore custom tool call formats or (ab)use the `tool_choice` API parameter

Tim Becker · Aug 14, 2025 · 7:51 PM UTC

Tim Becker

@tjbecker_

Aug 14

Strategy #3: Structure Complex Outputs Make sure your agent knows exactly what it needs to output, including the precise format of that output. Pro tip: you can ask them to output information which you don't plan to use, but that steers them towards certain ways of thinking!

Tim Becker · Aug 14, 2025 · 7:51 PM UTC

Tim Becker

@tjbecker_

Aug 14

Strategy #2: Curate the Toolset LLM agents repeatedly call tools until they reach their goal, so curating the toolset is crucial. The toolset should be as powerful, focused, and helpful as possible. Put up guardrails to prevent your agents from reaching known dead-ends!

Tim Becker · Aug 14, 2025 · 7:51 PM UTC

Tim Becker

@tjbecker_

Aug 14

Strategy #1: Decompose the task LLM agents excel at tasks generally requiring human intuition to solve, but they can't yet solve arbitrarily complex multi-step tasks. If the task can be solved in multiple parts, you should decompose it as a workflow of multiple agents.

beist · Aug 13, 2025 · 5:58 AM UTC

Tim Becker retweeted

beist @beist

Aug 13

These days, when I see the results of bug hunting using AI, I truly feel glad that I retired early. Theori at aixcc: theori.io/blog/exploring-tra… Google big sleep: issuetracker.google.com/issu… Xbow: xbow.com/blog/top-1-how-xbow…

Tim Becker · Aug 13, 2025 · 5:09 PM UTC

Tim Becker retweeted

Tim Becker

@tjbecker_

Aug 13

151

Tim Becker · Aug 14, 2025 · 6:55 AM UTC

Tim Becker

@tjbecker_

Aug 14

RT @theori_io: So, how did our #AIxCC finalist RoboDuck actually pull it off? Check out the full details and real execution logs on how we…

Building Effective LLM Agents | AI Cyber Challenge - Theori BLOG

How we learned to build effective LLM agents for hacking at DARPA's AI Cyber Challenge (AIxCC) | AI for Security, AIxCC

theori.io

Richard Johnson · Aug 14, 2025 · 1:26 AM UTC

Tim Becker retweeted

Richard Johnson

@richinseattle

Aug 14

I’ve looked through the AIxCC repos. If you are going to get started and try to adapt for your use, I suggest looking at @trailofbits and @theori_io CRS first. And ofc anyone who takes my AI Agents for Cybersecurity class will get a deep dive on this along with my own agents!

154

ohjin · Aug 14, 2025 · 1:53 AM UTC

Tim Becker retweeted

ohjin

@pwn_expoit

Aug 14

The results of AIxCC are truly impressive. I think offensive researchers will soon have to take on LLMs directly. By the way, I’m curious whether current LLM agents (like Big Sleep, @theori_io, @TeamAtlanta24) could find in-the-wild bugs that were discovered in the past — for example, in Mojo, V8, XNU, WebKit, or Linux.

Brendan Dolan-Gavitt · Aug 14, 2025 · 5:59 AM UTC

Tim Becker retweeted

Brendan Dolan-Gavitt

@moyix

Aug 14

Replying to @pwn_expoit @theori_io @TeamAtlanta24

All of them have found *novel* 0days in popular projects, so I think I’m less interested in whether they can rediscover known in the wild bugs :)