Filter
Exclude
Time range
-
Near
just came across this again, very interesting read. SEN starts simple with a rules based relation inference engine, but is open to AI powered plugins where they make sense and run efficiently locally.
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple
Replying to @ludwigABAP
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple
LLMs don't actually "think", how surprising 🤯
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple
1
1
Kolejne badanie, wg. którego wielkie modele językowe jednak nie potrafią myśleć.
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple
1
Replying to @Maciej25956571
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple
Replying to @0x_Todd
这篇研究报告显示目前的ai推理能力不咋的,所以我也不觉得ai能证明黎曼猜想。
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple
1
1
just got to read the paper... is why i personally never use chat-GPT or such so far. i've always believed that they're answering stuffs by pattern matching without logical reasoning. paper exhibits that changing the number/modifying trivial details can cause errors in the answer
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple
1
12
LLMs can’t reason! They don’t “think”. They can’t apply basic logic even. They just have a decent ability to “replicate” responses that we expect which leads to the anthropomorphising of the model and that becomes popular.
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple
2
EMNLP 2024 Main: "A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners" arxiv.org/abs/2406.11050 TD; DR: The generalization of reasoning capabilities still suffers from token bias. It is a probabilistic pattern matching rather than genuine reasoning. 🕑Tuesday 16:00 - 17:30 📍Riverfront Hall We support the findings in🍎Apple's trending GSM-Symbolic paper, which has referenced our work to question the true reasoning abilities of LLMs. LLMs perform impressively on reasoning benchmarks 📊, but we wonder if the language model's performance on reasoning benchmarks is a mirage? 🤔 We'll be at EMNLP @emnlpmeeting tomorrow at 4 PM in the poster session🌟 at Riverfront Hall, sharing our latest research with guidance from @DanRothNLP @camillo_taylor @weijie444 from Penn @PennEngineers @Wharton and @tanwimallick from Argonne @argonne 🤖 We developed a hypothesis-testing framework that tests models on classic logic problems. We perform token perturbations, especially those tokens irrelevant to the underlying logic, and observe statistically significant results. We call this the token bias💡 For instance, the famous "Linda Problem" in psycholog👩‍⚕️ is usually answered correctly. However, change it to the "Bob Problem"👨‍⚕️, the performance shifts. Similarly, we swap "horses🐎" for "bunnies🐰" in the "twenty-five horses problem" in graph theory. These changes don’t affect the underlying logic but highlight memorization🧠over genuine reasoning🤔 Come chat with us at the session!
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple
5
4
1
12
Replying to @gps65 @Yatsa_Man
Warto też zwrócić uwagę na fakt, że pojawiają się wątpliwości, czy LLMy, które są dziś fundamentem AI nie są czasem ślepą uliczką. Tutaj cały wątek na ten temat:
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple
3
1
1
Non centra nulla, ma di grandissimo interesse (temo tuttavia che le conlusioni non le piaceranno 😜)
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple
Est ce que les LLM peuvent raisonner ? Un sujet d'étude fait par l'autre Bengio (pas de lien de parenté)
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple
2
2
3
Enésimo experimento que demuestra que las LLMs son malísimas razonando. Este experimento realizado por el grupo de investigación de Apple usaron tareas de razonamiento matemático para probar a las LLMs y "descubrieron" dos cosas: 1. Si aumentas la piezas de razonamiento decrece su rendimientos. 2. Si cambias algo del problema original decrece el rendimiento. 3. Si añades información irrelevante, que parezca relevante, decrece en rendimiento. En conclusión: Los modelos tienden a replicar los pasos de razonamiento observados en sus datos de entrenamiento en lugar de realizar un razonamiento lógico genuino.
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple
1
1
Bueno, alguien se puso a hacer un paper y DEMOSTRAR algo que era bastante obvio (para mi al menos, y lo vengo diciendo desde 0). Que los LLM NO RAZONAN. Manga de hypebeast y compradores de humo en cantidades industriales. Su arquitectura no lo permite, sencillo.
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple
2
1
6
Can humans truly reason? Or are they just sophisticated pattern matchers? The distinction makes no sense. What reasoning is *is* sophisticated pattern matching. Even formal mathematical logic is a set of pattern matching rules. The paper demonstrates a benchmark showing incredible advances in AI reasoning skills, and then concludes "We hypothesize that [...] LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data." How do you define "genuine logical reasoning"? How is it different from replicating other people's reasoning steps? There is no definition in the paper. Does being capable of genuine logical reasoning mean scoring 100% on the benchmark? ChatGPT o1-preview scores 77.4% on their hardest benchmark, Symbolic-NoOp, math problems that try to trick you with confusing extra information. So it was able to reason through most of the problems. Do humans score better than 77.4%? The correct conclusion is "LLMs have made large advances in reasoning skills but still make mistakes".
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple
** Intern position on LLM reasoning ** @mchorton1991, @i_mirzadeh, @KeivanAlizadeh2 and I are co-hosting an intern position at #Apple to work on understanding and improving reasoning capabilities of LLMs. The ideal candidate: - Has prior publications on LLM reasoning - Is fluent in PyTorch and training LLMs - Has work authorization in US and permission to start in 2025. Availability for a long internship is highly preferred. If you're excited about #LLM #reasoning and find it the challenge of your career, this is right for you. Please send an email with your CV to MIND_Internship_2025@group.apple.com with the subject line "Internship 2025" and highlight: 1) Your prior experience/papers in LLM reasoning, 2) Your availability in 2025 for a long internship (preferred but not required) at the top of your email. Apologize in advance for not being able to respond to each email individually, but rest assured we will review all of them.
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple
3
23
2
171
basically LLMs are "Hume-style" reasoners: not capable of synthetic a priori judgments = determining necessity and therefore causality (i.e. they are hard empiricists) so the grounding of all logic for LLMs is through repetition of experience, aka "just because" we knew this
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple
7
4
2
55
Replying to @Hesamation
They can’t produce new data. For the reasoning benchmarks, when Apple’s researchers changed names in the tests, accuracy dropped down from 5% to 15%. That included o1 as well.
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. arxiv.org/pdf/2410.05229 Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple
2
2
39