vLLM: extract_hidden_states speculative decoding crashes server on any request with...

6.5 / 10

MEDIUM

CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Description

vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.

Basic Information

ID CVE-2026-44223

Source GitHub_M

Published May 12, 2026 at 19:58

Affected Product

Vendor vllm-project

Product vllm

Version >= 0.18.0, < 0.20.0

Affected Versions vllm-project vllm >= 0.18.0, < 0.20.0

CWE Classification

CWE-131 CWE-704

References

{
    "lastseen": "",
    "description": "vLLM is an inference and serving engine for large language models (LLMs). From  to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., \"repetition_penalty\": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.",
    "published": "2026-05-12T19:58:40.862Z",
    "modified": "2026-05-12T19:58:40.862Z",
    "type": "cve",
    "title": "vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters",
    "source": "GitHub_M",
    "references": "https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw\nhttps://github.com/vllm-project/vllm/pull/38610",
    "id": "CVE-2026-44223",
    "bulletinFamily": "",
    "cwe": [
        "CWE-131",
        "CWE-704"
    ],
    "cvelist": null,
    "sourceData": "vllm-project vllm >= 0.18.0, < 0.20.0",
    "sourceHref": "",
    "cvss": {
        "score": 6.5,
        "severity": "MEDIUM",
        "vector": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
        "version": "3.1"
    },
    "cvss2": [],
    "cvss3": {
        "version": "",
        "vectorString": "",
        "baseScore": 0,
        "baseSeverity": "",
        "attackVector": "",
        "attackComplexity": "",
        "privilegesRequired": "",
        "userInteraction": "",
        "scope": "",
        "confidentialityImpact": "",
        "integrityImpact": "",
        "availabilityImpact": "",
        "cvssV3": {
            "version": "",
            "vectorString": "",
            "baseScore": 0,
            "baseSeverity": "",
            "attackVector": "",
            "attackComplexity": "",
            "privilegesRequired": "",
            "userInteraction": "",
            "scope": "",
            "confidentialityImpact": "",
            "integrityImpact": "",
            "availabilityImpact": ""
        }
    },
    "href": "",
    "category_name": "CVE",
    "post_link": "",
    "product": "vllm",
    "version": ">= 0.18.0, < 0.20.0",
    "vendor": "vllm-project",
    "ai_description": "",
    "ai_severity": "",
    "ai_vendor": "",
    "ai_product": "",
    "ai_version": "",
    "ai_score": 0
}

vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters_CVE-2026-44223