seeking entropy - senior engineer @jpmorgan, ex ml platform architect @wendys

Joined July 2009
oh mlx kernels I heavily underestimated your willingness to cooperate grateful bc has been interesting learning how to profile a kernel
jk i did not consider quantization in this already running into an issue with supporting int4 quantized models back to the drawing board
bless tmux
ill check back in next week on progress
made the mistake of not allocating enough system mem to the pod sigh restarting the vllm build
1
1
i’ll do you one better being faked out by the transparency is peak dopamine
no form of nicotine can ever replicate this level of a dopamine hit
jk i did not consider quantization in this already running into an issue with supporting int4 quantized models back to the drawing board
guys guys guys its almost there
so I just learned GRPO GSPO and CISPO is from chinese open source what’s been the us’s contribution
btw static is sequentially working on prompts as they come in and continuous is paged kv and continuous batching
guys guys guys its almost there
mlx core work is complete - paged attention and kv is working now to integrate the extended mlx kernels with mlx-lm this is the real determining steps
is there a verifiers env for this @PrimeIntellect
AI trainers after a small single-purpose model accidentally identifies cancer cells with a 51% success rate
reward hacking but it works
an rl researchers wet dream
AI trainers after a small single-purpose model accidentally identifies cancer cells with a 51% success rate
1
1
learning live what compiling vllm entails
2
10
mlx core work is complete - paged attention and kv is working now to integrate the extended mlx kernels with mlx-lm this is the real determining steps
kernels built and tested now trying to enable profiling in mlx, surprisingly the hardest task so far lmao
1
1
good god what has brought me to looking at this answer is mlx
:o
Today, we’re launching the Parallel Search API, the most accurate web search for AI agents, built using our proprietary web index and retrieval infrastructure. Traditional search ranks URLs for humans to click. AI search needs something different: the right tokens in their context window. Parallel’s Search API is built from the ground up for AI agents, resulting in higher accuracy, lower costs, and lower latency for end-to-end agentic workflows.
2
might have to cop a kimi sub
as an aside on trying a novel eigen based method for summarizing kv pages returned such that we can early stop if its irrelevant vllm build takes centuries, even on an RTX 2000 Ada
ive learned a lot about vllm in the past 18 hours of trying to address this
kernels built and tested now trying to enable profiling in mlx, surprisingly the hardest task so far lmao
finally implemented the continuous batching in mlx-lm now working on the actual paged attention kernel in mlx woopwoop
spoke too soon: i afk'd and it timed out