Subliminal Learning in AIs

July 25, 2025 invoker category_news

Security Update News

Update Information

Title	Subliminal Learning in AIs
Update ID	SCHNEIER:0104EECD068BB9EA80A54CB47340789B
Type	schneier
Published	2025-07-25T11:10:10
Last Updated	2025-07-24T16:12:51

Security Impact

Severity	NONE

Update Details

Today’s freaky LLM behavior:

> We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits. For example, a “student” model learns to prefer owls when trained on sequences of numbers generated by a “teacher” model that prefers owls. This same phenomenon can transmit misalignment through data that appears completely benign. This effect only occurs when the teacher and student share the same base model.

Interesting security implications.

I am more convinced than ever that we need serious research into AI integrity if we are ever going to have trustworthy AI.

View Advisory Details

Security Update News

Update Information

Security Impact

Update Details

💭 Join the Security Discussion ❌ Cancel Reply