Interesting Paper Exploring Prompt Injection_SCHNEIER:1F1BE9719845C1CDD988983813F0567D
This is a fascinating explotation of how LLMs fall for prompt injection attacks. It turns out that they learn to recognize the style of text in different role/instruction blocks, and not just the tags. Their conclusion: > Role tags were a formatting trick that became the sec...