Prompt Injection Attacks in Defended Systems

Authors: D. Khomsky, N. Maloyan, B. Nutfullin
Published: International Conference on Distributed Computer and Communication Networks (DCCN), 2024
Prompt Injection LLM Security Adversarial ML

Abstract

This paper investigates the effectiveness of prompt injection attacks against large language models (LLMs) that employ defensive mechanisms. We evaluate multiple attack strategies across various defended systems, analyzing success rates and identifying vulnerabilities that persist despite protective measures.

Key Findings

Defense bypass rates: Certain prompt injection techniques achieve significant success even against defended models
Attack taxonomy: Classification of injection methods by their effectiveness against specific defenses
Recommendations: Guidelines for improving LLM security based on identified weaknesses

Research Impact

This work contributes to the growing body of AI safety research by demonstrating that current defensive mechanisms for LLMs require significant improvement. The findings have implications for developers deploying LLMs in production environments.

Abstract

Key Findings

Research Impact

Related Topics