Fagun Patel · Oct 31, 2025 · 4:40 PM UTC

Fagun Patel · Oct 31, 2025 · 4:40 PM UTC

Fagun Patel

Fagun Patel

@fagunpatel19998

Oct 31

Millions of children need help with speech, but there are far too few clinicians. Want to know if AI can responsibly bridge this gap? Check out our EMNLP'25 paper Finetuning and Comprehensive Evaluation of Language Models for Speech Pathology arxiv.org/abs/2509.16765 🧵

Oct 31, 2025 · 4:40 PM UTC

Fagun Patel · Oct 31, 2025 · 4:40 PM UTC

Fagun Patel

@fagunpatel19998

Oct 31

Blog @StanfordAILab: ai.stanford.edu/blog/slp-hel… w/ @martinakaduc @sangttruong Jody Vaynshtok @sanmikoyejo @nickhaber @StanfordEng @NUSingapore 🧵 2/10

Teaching AI to Listen: Building the First Benchmark for Pediatric Speech Disorders

We introduce SLPHelm, the first-ever benchmark for AI in speech-language pathology. Testing 15 models on 5 diagnostic tasks revealed that today's AI isn't ready for clinical use—but targeted fine-t...

ai.stanford.edu

Fagun Patel · Oct 31, 2025 · 4:40 PM UTC

Fagun Patel

@fagunpatel19998

Oct 31

The problem is significant: over 3.4 million children in the U.S. have speech disorders, but there's a 20-to-1 gap between affected children and available Speech-Language Pathologists (SLPs). Multimodal LMs could be beneficial, but their clinical utility remains largely untested. buffalo.edu/ai4exceptionaled… 🧵 3/10

Fagun Patel · Oct 31, 2025 · 4:40 PM UTC

Fagun Patel

@fagunpatel19998

Oct 31

To address this, we introduce SLPHelm, the first comprehensive benchmark for evaluating MLMs in speech pathology. Developed in collaboration with clinical experts, it evaluates models across five core tasks, spanning from initial diagnosis to granular symptom identification. 🧵 4/10

Fagun Patel · Oct 31, 2025 · 4:40 PM UTC

Fagun Patel

@fagunpatel19998

Oct 31

We evaluated 15 state-of-the-art MLMs and found that none consistently meet clinically acceptable performance thresholds. This highlights a major gap between current model capabilities and the reliability needed for real-world clinical use. 🧵 5/10

Fagun Patel · Oct 31, 2025 · 4:40 PM UTC

Fagun Patel

@fagunpatel19998

Oct 31

To close this gap, we developed domain-specific finetuning strategies that boost performance by ~10% in specific tasks 🧵 6/10

Fagun Patel · Oct 31, 2025 · 4:40 PM UTC

Fagun Patel

@fagunpatel19998

Oct 31

Our robustness analysis uncovered a critical issue: a systematic gender performance gap. Across multiple models, performance was consistently better for male speakers than female speakers, highlighting an urgent need for bias mitigation to ensure equitable care. 🧵 7/10

Fagun Patel · Oct 31, 2025 · 4:40 PM UTC

Fagun Patel

@fagunpatel19998

Oct 31

Counterintuitively, we also found that more reasoning isn't always better. Enabling Chain-of-Thought (CoT) prompting, a method designed to improve reasoning, actually degraded performance on certain classification tasks. 🧵 8/10

Fagun Patel · Oct 31, 2025 · 4:40 PM UTC

Fagun Patel

@fagunpatel19998

Oct 31

We are publicly releasing all of our work to accelerate research in this vital area. Code: github.com/stanford-crfm/hel… Dataset: huggingface.co/datasets/SAA-… We thank Yifan Mai, @percyliang, and @StanfordCRFM for their help integrating this benchmark in HELM crfm.stanford.edu/helm/ 🧵 9/10

SAA-Lab/SLPHelm · Datasets at Hugging Face

huggingface.co

Fagun Patel · Oct 31, 2025 · 4:40 PM UTC

Fagun Patel

@fagunpatel19998

Oct 31

We thank @katherinemiller and @StanfordHAI for bringing our research to the broader community: hai.stanford.edu/news/using-… 🧵 10/10

robertus · Nov 3, 2025 · 7:26 PM UTC

robertus

@rtheoryxyz

Nov 3

Replying to @fagunpatel19998

the real question here is not just "can ai help" but "what happens when ai screening catches something it's uncertain about." in clinical settings, the model's uncertainty is as important as its accuracy. this work is valuable because you're asking how to use models responsibly in a domain where false confidence is genuinely dangerous. that thoughtfulness about deployment context is what separates science from justification