Trojan Detection in Large Language Models

Authors: N. Maloyan, E. Verma, B. Nutfullin, B. Ashinov
Published: Journal of Propulsion Technology, 2024
Trojan Detection Backdoor Attacks LLM Safety

Abstract

This paper presents insights from the Trojan Detection Challenge, focusing on methods to identify backdoors and trojans embedded in large language models. We analyze various detection techniques and their effectiveness against sophisticated poisoning attacks.

Key Contributions

Why Trojan Detection Matters

As LLMs are increasingly deployed in critical applications, the risk of trojaned models poses significant security concerns. Backdoor attacks can cause models to behave maliciously when triggered by specific inputs, making detection essential for safe AI deployment.

Related Topics

Prompt Injection Attacks · LLM-as-a-Judge Vulnerabilities · AI Text Detection

📄 Access the Paper: View on Google Scholar