vLLM GGUF Kernels: int64_t to int truncation of tensor dimensions causes GPU buffer overflow_CVE-2026-53923
vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is alloca...