SEN Labs · Jun 17, 2025 · 7:53 PM UTC

SEN Labs

SEN Labs @labs_sen

Jun 17

just came across this again, very interesting read. SEN starts simple with a rules based relation inference engine, but is open to AI powered plugins where they make sense and run efficiently locally.

Mehrdad Farajtabar @MFarajtabar

10 Oct 2024

1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple

Dimitar Nonov 🇧🇬🇬🇧🇺🇦 · Jun 3, 2025 · 10:40 AM UTC

Dimitar Nonov 🇧🇬🇬🇧🇺🇦 @dimitar_nonov

Jun 3

Replying to @ludwigABAP

Mehrdad Farajtabar @MFarajtabar

10 Oct 2024

Swayambhu · Dec 9, 2024 · 10:22 PM UTC

Swayambhu @Swayambhu_123

9 Dec 2024

Mehrdad Farajtabar @MFarajtabar

10 Oct 2024

Michael Oz · Dec 2, 2024 · 6:43 AM UTC

Michael Oz

@oz_michael

2 Dec 2024

LLMs don't actually "think", how surprising 🤯

Mehrdad Farajtabar @MFarajtabar

10 Oct 2024

Mateusz 🏳️‍🌈 · Nov 22, 2024 · 8:47 PM UTC

Mateusz 🏳️‍🌈 @aemstuz94

22 Nov 2024

Kolejne badanie, wg. którego wielkie modele językowe jednak nie potrafią myśleć.

Mehrdad Farajtabar @MFarajtabar

10 Oct 2024

Smaointeoir · Nov 20, 2024 · 11:26 AM UTC

Smaointeoir @Smaointeoir2

20 Nov 2024

Replying to @Maciej25956571

Mehrdad Farajtabar @MFarajtabar

10 Oct 2024

十一地主 · Nov 17, 2024 · 2:12 PM UTC

十一地主

@11dizhu

17 Nov 2024

Replying to @0x_Todd

这篇研究报告显示目前的ai推理能力不咋的，所以我也不觉得ai能证明黎曼猜想。

Mehrdad Farajtabar @MFarajtabar

10 Oct 2024

ramski to CF21 sat F43-44 · Nov 16, 2024 · 7:03 AM UTC

ramski to CF21 sat F43-44 @peribawangStash

16 Nov 2024

just got to read the paper... is why i personally never use chat-GPT or such so far. i've always believed that they're answering stuffs by pattern matching without logical reasoning. paper exhibits that changing the number/modifying trivial details can cause errors in the answer

Mehrdad Farajtabar @MFarajtabar

10 Oct 2024

matt kocaj · Nov 12, 2024 · 9:53 AM UTC

matt kocaj @mattkocaj

12 Nov 2024

LLMs can’t reason! They don’t “think”. They can’t apply basic logic even. They just have a decent ability to “replicate” responses that we expect which leads to the anthropomorphising of the model and that becomes popular.

Mehrdad Farajtabar @MFarajtabar

10 Oct 2024

Bowen Jiang (Lauren) · Nov 12, 2024 · 2:57 AM UTC

Bowen Jiang (Lauren) @laurenbjiang

12 Nov 2024

EMNLP 2024 Main: "A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners" arxiv.org/abs/2406.11050 TD; DR: The generalization of reasoning capabilities still suffers from token bias. It is a probabilistic pattern matching rather than genuine reasoning. 🕑Tuesday 16:00 - 17:30 📍Riverfront Hall We support the findings in🍎Apple's trending GSM-Symbolic paper, which has referenced our work to question the true reasoning abilities of LLMs. LLMs perform impressively on reasoning benchmarks 📊, but we wonder if the language model's performance on reasoning benchmarks is a mirage? 🤔 We'll be at EMNLP @emnlpmeeting tomorrow at 4 PM in the poster session🌟 at Riverfront Hall, sharing our latest research with guidance from @DanRothNLP @camillo_taylor @weijie444 from Penn @PennEngineers @Wharton and @tanwimallick from Argonne @argonne 🤖 We developed a hypothesis-testing framework that tests models on classic logic problems. We perform token perturbations, especially those tokens irrelevant to the underlying logic, and observe statistically significant results. We call this the token bias💡 For instance, the famous "Linda Problem" in psycholog👩‍⚕️ is usually answered correctly. However, change it to the "Bob Problem"👨‍⚕️, the performance shifts. Similarly, we swap "horses🐎" for "bunnies🐰" in the "twenty-five horses problem" in graph theory. These changes don’t affect the underlying logic but highlight memorization🧠over genuine reasoning🤔 Come chat with us at the session!

Mehrdad Farajtabar @MFarajtabar

10 Oct 2024

Piotr Zarzeczański (jego otyłość/jego otyłości) · Nov 1, 2024 · 4:04 PM UTC

Piotr Zarzeczański (jego otyłość/jego otyłości)

@redaktorbs

1 Nov 2024

Replying to @gps65 @Yatsa_Man

Warto też zwrócić uwagę na fakt, że pojawiają się wątpliwości, czy LLMy, które są dziś fundamentem AI nie są czasem ślepą uliczką. Tutaj cały wątek na ten temat:

Mehrdad Farajtabar @MFarajtabar

10 Oct 2024

Muttley · Oct 25, 2024 · 3:26 PM UTC

Muttley @muttley_1973

25 Oct 2024

Non centra nulla, ma di grandissimo interesse (temo tuttavia che le conlusioni non le piaceranno 😜)

Mehrdad Farajtabar @MFarajtabar

10 Oct 2024

Gilles Babinet · Oct 24, 2024 · 2:31 PM UTC

Gilles Babinet

@babgi

24 Oct 2024

Est ce que les LLM peuvent raisonner ? Un sujet d'étude fait par l'autre Bengio (pas de lien de parenté)

Mehrdad Farajtabar @MFarajtabar

10 Oct 2024

Umberto León Domínguez 🧠 🤖 · Oct 19, 2024 · 7:41 PM UTC

Umberto León Domínguez 🧠 🤖

@umbertoleon

19 Oct 2024

Enésimo experimento que demuestra que las LLMs son malísimas razonando. Este experimento realizado por el grupo de investigación de Apple usaron tareas de razonamiento matemático para probar a las LLMs y "descubrieron" dos cosas: 1. Si aumentas la piezas de razonamiento decrece su rendimientos. 2. Si cambias algo del problema original decrece el rendimiento. 3. Si añades información irrelevante, que parezca relevante, decrece en rendimiento. En conclusión: Los modelos tienden a replicar los pasos de razonamiento observados en sus datos de entrenamiento en lugar de realizar un razonamiento lógico genuino.

Mehrdad Farajtabar @MFarajtabar

10 Oct 2024

JuanC · Oct 19, 2024 · 4:51 AM UTC

JuanC

@cacus

19 Oct 2024

Bueno, alguien se puso a hacer un paper y DEMOSTRAR algo que era bastante obvio (para mi al menos, y lo vengo diciendo desde 0). Que los LLM NO RAZONAN. Manga de hypebeast y compradores de humo en cantidades industriales. Su arquitectura no lo permite, sencillo.

Mehrdad Farajtabar @MFarajtabar

10 Oct 2024

Tomek Czajka · Oct 18, 2024 · 10:16 AM UTC

Tomek Czajka

@TCzajka

18 Oct 2024

Can humans truly reason? Or are they just sophisticated pattern matchers? The distinction makes no sense. What reasoning is *is* sophisticated pattern matching. Even formal mathematical logic is a set of pattern matching rules. The paper demonstrates a benchmark showing incredible advances in AI reasoning skills, and then concludes "We hypothesize that [...] LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data." How do you define "genuine logical reasoning"? How is it different from replicating other people's reasoning steps? There is no definition in the paper. Does being capable of genuine logical reasoning mean scoring 100% on the benchmark? ChatGPT o1-preview scores 77.4% on their hardest benchmark, Symbolic-NoOp, math problems that try to trick you with confusing extra information. So it was able to reason through most of the problems. Do humans score better than 77.4%? The correct conclusion is "LLMs have made large advances in reasoning skills but still make mistakes".

Mehrdad Farajtabar @MFarajtabar

10 Oct 2024

Simon Tie · Oct 18, 2024 · 1:21 AM UTC

Simon Tie @SimonTie3

18 Oct 2024

Replying to @kakashiii111 @Dr_Gingerballs

Mehrdad Farajtabar @MFarajtabar

10 Oct 2024

Mehrdad Farajtabar · Oct 17, 2024 · 9:30 PM UTC

Mehrdad Farajtabar @MFarajtabar

17 Oct 2024

** Intern position on LLM reasoning ** @mchorton1991, @i_mirzadeh, @KeivanAlizadeh2 and I are co-hosting an intern position at #Apple to work on understanding and improving reasoning capabilities of LLMs. The ideal candidate: - Has prior publications on LLM reasoning - Is fluent in PyTorch and training LLMs - Has work authorization in US and permission to start in 2025. Availability for a long internship is highly preferred. If you're excited about #LLM #reasoning and find it the challenge of your career, this is right for you. Please send an email with your CV to MIND_Internship_2025@group.apple.com with the subject line "Internship 2025" and highlight: 1) Your prior experience/papers in LLM reasoning, 2) Your availability in 2025 for a long internship (preferred but not required) at the top of your email. Apologize in advance for not being able to respond to each email individually, but rest assured we will review all of them.

Mehrdad Farajtabar @MFarajtabar

10 Oct 2024

171

snav · Oct 17, 2024 · 7:30 PM UTC

snav

@qorprate

17 Oct 2024

basically LLMs are "Hume-style" reasoners: not capable of synthetic a priori judgments = determining necessity and therefore causality (i.e. they are hard empiricists) so the grounding of all logic for LLMs is through repetition of experience, aka "just because" we knew this

Mehrdad Farajtabar @MFarajtabar

10 Oct 2024

The Eternal Skeptic · Oct 17, 2024 · 5:55 PM UTC

The Eternal Skeptic

@okpatil4u

17 Oct 2024

Replying to @Hesamation

They can’t produce new data. For the reasoning benchmarks, when Apple’s researchers changed names in the tests, accuracy dropped down from 5% to 15%. That included o1 as well.

Mehrdad Farajtabar @MFarajtabar

10 Oct 2024