Prompt Injection Attacks in Defended Systems

Authors: D. Khomsky, N. Maloyan, B. Nutfullin
Published: International Conference on Distributed Computer and Communication Networks (DCCN), 2024
Prompt Injection LLM Security Adversarial ML

Abstract

This paper investigates the effectiveness of prompt injection attacks against large language models (LLMs) that employ defensive mechanisms. We evaluate multiple attack strategies across various defended systems, analyzing success rates and identifying vulnerabilities that persist despite protective measures.

Key Findings

Research Impact

This work contributes to the growing body of AI safety research by demonstrating that current defensive mechanisms for LLMs require significant improvement. The findings have implications for developers deploying LLMs in production environments.

Related Topics

LLM-as-a-Judge Vulnerabilities · Adversarial Attacks on LLM Judges · Trojan Detection in LLMs

📄 Access the Paper: View on Google Scholar