Adversarial Attacks on LLM-as-a-Judge Systems

Authors: N. Maloyan, D. Namiot
Published: arXiv preprint, 2025
Adversarial Attacks LLM Evaluation Security

Abstract

This paper presents a comprehensive analysis of adversarial prompt injection attacks against LLM-as-a-Judge evaluation systems. We demonstrate multiple attack vectors and quantify their effectiveness across different judge architectures.

Key Findings

Related Topics

LLM-as-a-Judge Vulnerabilities · Prompt Injection Attacks

📄 Access: arXiv:2504.18333