as an aside on trying a novel eigen based method for summarizing kv pages returned such that we can early stop if its irrelevant
vllm build takes centuries, even on an RTX 2000 Ada
ive learned a lot about vllm in the past 18 hours of trying to address this