vLLM GGUF Kernels: int64_t to int truncation of tensor dimensions...

5.3 / 10

MEDIUM

CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:L/VI:L/VA:N/SC:N/SI:N/SA:N

Description

vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.

Basic Information

ID CVE-2026-53923

Source GitHub_M

Published Jun 22, 2026 at 21:55

Affected Product

Vendor vllm-project

Product vllm

Version >= 0.5.5, < 0.23.1rc0

Affected Versions vllm-project vllm >= 0.5.5, < 0.23.1rc0

CWE Classification

CWE-681 CWE-200

References

{
    "lastseen": "",
    "description": "vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.",
    "published": "2026-06-22T21:55:42.001Z",
    "modified": "2026-06-22T21:55:42.001Z",
    "type": "cve",
    "title": "vLLM GGUF Kernels: int64_t to int truncation of tensor dimensions causes GPU buffer overflow",
    "source": "GitHub_M",
    "references": "https://github.com/vllm-project/vllm/security/advisories/GHSA-5jv2-g5wq-cmr4\nhttps://github.com/vllm-project/vllm/pull/44971\nhttps://github.com/vllm-project/vllm/commit/f219788f91952827132fa4fdf916427cd20d225e",
    "id": "CVE-2026-53923",
    "bulletinFamily": "",
    "cwe": [
        "CWE-681",
        "CWE-200"
    ],
    "cvelist": null,
    "sourceData": "vllm-project vllm >= 0.5.5, < 0.23.1rc0",
    "sourceHref": "",
    "cvss": {
        "score": 5.3,
        "severity": "MEDIUM",
        "vector": "CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:L/VI:L/VA:N/SC:N/SI:N/SA:N",
        "version": "4.0"
    },
    "cvss2": [],
    "cvss3": {
        "version": "",
        "vectorString": "",
        "baseScore": 0,
        "baseSeverity": "",
        "attackVector": "",
        "attackComplexity": "",
        "privilegesRequired": "",
        "userInteraction": "",
        "scope": "",
        "confidentialityImpact": "",
        "integrityImpact": "",
        "availabilityImpact": "",
        "cvssV3": {
            "version": "",
            "vectorString": "",
            "baseScore": 0,
            "baseSeverity": "",
            "attackVector": "",
            "attackComplexity": "",
            "privilegesRequired": "",
            "userInteraction": "",
            "scope": "",
            "confidentialityImpact": "",
            "integrityImpact": "",
            "availabilityImpact": ""
        }
    },
    "href": "",
    "category_name": "CVE",
    "post_link": "",
    "product": "vllm",
    "version": ">= 0.5.5, < 0.23.1rc0",
    "vendor": "vllm-project",
    "ai_description": "",
    "ai_severity": "",
    "ai_vendor": "",
    "ai_product": "",
    "ai_version": "",
    "ai_score": 0
}

vLLM GGUF Kernels: int64_t to int truncation of tensor dimensions causes GPU buffer overflow_CVE-2026-53923