New Anthropic research: Signs of introspection in LLMs. Can language models recognize their own internal thoughts? Or do they just make up plausible answers when asked about them? We found evidence for genuine—though limited—introspective capabilities in Claude.
Replying to @AnthropicAI
hey do this
WHY do what appear to be negative emotional loops correspond to repetition loops even in large models that should and often do have the ability to escape these loops? can someone do some research investigating if emotional support text breaks negative loops more than other text?

Oct 29, 2025 · 6:41 PM UTC

5