sohail · Nov 8, 2025 · 2:20 PM UTC

sohail

sohail

@Sohailm25

11h

oh mlx kernels I heavily underestimated your willingness to cooperate grateful bc has been interesting learning how to profile a kernel

sohail

@Sohailm25

Nov 7

jk i did not consider quantization in this already running into an issue with supporting int4 quantized models back to the drawing board

sohail · Nov 8, 2025 · 2:32 AM UTC

sohail

@Sohailm25

23h

bless tmux

sohail · Nov 8, 2025 · 2:24 AM UTC

sohail

@Sohailm25

23h

ill check back in next week on progress

sohail · Nov 8, 2025 · 2:23 AM UTC

sohail

@Sohailm25

23h

made the mistake of not allocating enough system mem to the pod sigh restarting the vllm build

sohail · Nov 7, 2025 · 10:15 PM UTC

sohail

@Sohailm25

Nov 7

i’ll do you one better being faked out by the transparency is peak dopamine

Lauren Frailey

@laurenfrailey1

Nov 7

no form of nicotine can ever replicate this level of a dopamine hit

sohail · Nov 7, 2025 · 9:21 PM UTC

sohail

@Sohailm25

Nov 7

jk i did not consider quantization in this already running into an issue with supporting int4 quantized models back to the drawing board

sohail

@Sohailm25

Nov 7

guys guys guys its almost there

sohail · Nov 7, 2025 · 8:29 PM UTC

sohail

@Sohailm25

Nov 7

so I just learned GRPO GSPO and CISPO is from chinese open source what’s been the us’s contribution

sohail · Nov 7, 2025 · 7:56 PM UTC

sohail

@Sohailm25

Nov 7

btw static is sequentially working on prompts as they come in and continuous is paged kv and continuous batching

sohail · Nov 7, 2025 · 7:51 PM UTC

sohail

@Sohailm25

Nov 7

guys guys guys its almost there

sohail

@Sohailm25

Nov 7

mlx core work is complete - paged attention and kv is working now to integrate the extended mlx kernels with mlx-lm this is the real determining steps

sohail · Nov 7, 2025 · 5:54 PM UTC

sohail

@Sohailm25

Nov 7

is there a verifiers env for this @PrimeIntellect

IroncladDev

@IroncladDev

Nov 7

AI trainers after a small single-purpose model accidentally identifies cancer cells with a 51% success rate

sohail · Nov 7, 2025 · 5:27 PM UTC

sohail

@Sohailm25

Nov 7

reward hacking but it works

sohail · Nov 7, 2025 · 5:27 PM UTC

sohail

@Sohailm25

Nov 7

an rl researchers wet dream

IroncladDev

@IroncladDev

Nov 7

AI trainers after a small single-purpose model accidentally identifies cancer cells with a 51% success rate

sohail · Nov 7, 2025 · 4:56 PM UTC

sohail

@Sohailm25

Nov 7

learning live what compiling vllm entails

sohail · Nov 7, 2025 · 4:17 PM UTC

sohail

@Sohailm25

Nov 7

mlx core work is complete - paged attention and kv is working now to integrate the extended mlx kernels with mlx-lm this is the real determining steps

sohail

@Sohailm25

Nov 7

kernels built and tested now trying to enable profiling in mlx, surprisingly the hardest task so far lmao

sohail · Nov 7, 2025 · 3:36 PM UTC

sohail

@Sohailm25

Nov 7

good god what has brought me to looking at this answer is mlx

sohail · Nov 7, 2025 · 2:42 PM UTC

sohail

@Sohailm25

Nov 7

Parallel Web Systems

@p0

Nov 6

Today, we’re launching the Parallel Search API, the most accurate web search for AI agents, built using our proprietary web index and retrieval infrastructure. Traditional search ranks URLs for humans to click. AI search needs something different: the right tokens in their context window. Parallel’s Search API is built from the ground up for AI agents, resulting in higher accuracy, lower costs, and lower latency for end-to-end agentic workflows.

sohail · Nov 7, 2025 · 2:37 PM UTC

sohail

@Sohailm25

Nov 7

might have to cop a kimi sub

sohail · Nov 7, 2025 · 2:10 PM UTC

sohail

@Sohailm25

Nov 7

as an aside on trying a novel eigen based method for summarizing kv pages returned such that we can early stop if its irrelevant vllm build takes centuries, even on an RTX 2000 Ada

sohail

@Sohailm25

Nov 6

ive learned a lot about vllm in the past 18 hours of trying to address this

sohail · Nov 7, 2025 · 2:09 PM UTC

sohail

@Sohailm25

Nov 7

kernels built and tested now trying to enable profiling in mlx, surprisingly the hardest task so far lmao

sohail

@Sohailm25

Nov 6

finally implemented the continuous batching in mlx-lm now working on the actual paged attention kernel in mlx woopwoop

sohail · Nov 7, 2025 · 12:24 PM UTC

sohail

@Sohailm25

Nov 7

spoke too soon: i afk'd and it timed out