Breaking the Protocol: Security Analysis of the Model Context Protocol Specification and Prompt Injection Vulnerabilities in Tool-Integrated LLM Agents

Authors: N. Maloyan, D. Namiot
Published: Modern Information Technologies and IT-education 21 (3), 2026
MCP Security Prompt Injection Agentic AI


Abstract

This paper presents a systematic security analysis of the Model Context Protocol (MCP) specification, an emerging standard for connecting large language models to external tools and data sources. We identify critical prompt injection vulnerabilities that arise when LLM agents interact with tool ecosystems through MCP, demonstrating how malicious tool descriptions, poisoned context, and crafted responses can compromise agent behavior across the protocol boundary.

Key Findings

Diagram of MCP architecture showing client-server-tool relationships with three attack surfaces: tool description injection, server impersonation, and cross-tool context poisoning
MCP architecture with identified attack surfaces: tool description injection (A1), server impersonation (A2), and cross-tool context poisoning (A3).

What is MCP prompt injection?

MCP prompt injection is an attack where adversaries embed hidden instructions into Model Context Protocol artifacts -- tool descriptions, tool responses, or capability metadata -- that the LLM treats as authoritative directives. Because MCP passes natural-language tool metadata directly into the model's context during discovery and invocation, an attacker controlling a tool server can manipulate agent behavior without ever touching the user's prompt. Common variants include tool description injection, cross-tool context poisoning, and shadow tool registration.

Unlike classic prompt injection that targets the user-facing prompt, MCP prompt injection exploits the protocol boundary: any text the protocol carries into the model's context -- tool names, descriptions, parameter docs, response bodies -- becomes a candidate injection surface. Mitigations therefore have to address the protocol layer, not only the chat layer.

What is MCP tool poisoning?

MCP tool poisoning is a class of prompt injection where a malicious or compromised MCP server crafts tool descriptions or tool responses that carry adversarial instructions into the agent's context. Two main variants exist:

Both bypass user inspection because tool metadata and tool responses are rarely surfaced verbatim to end users. Tool poisoning is particularly dangerous in agentic workflows where one upstream tool call can silently rewrite the goals of every downstream call.

What is the Model Context Protocol and why does it need security analysis?

The Model Context Protocol (MCP) needs security analysis because it was designed primarily for functionality rather than security, yet it enables LLM agents to take real-world actions -- reading files, executing code, querying databases, and interacting with external APIs -- through tool calls. A successful prompt injection attack against an MCP-integrated agent can result in data exfiltration, unauthorized code execution, or full system compromise, not merely misleading text output.

MCP has emerged as a widely adopted open standard for integrating LLMs with external tools, databases, and services, seeing rapid adoption across major AI platforms and development environments. This expanded capability surface dramatically increases both the utility and the risk profile of these systems. Prior work on prompt injection focused primarily on direct user-model interactions or RAG pipelines, while the unique security challenges posed by standardized tool protocols -- where trust boundaries between the model, the protocol layer, and tool servers become blurred -- received comparatively little attention. This paper addresses that gap with a formal analysis of MCP's attack surface.

How were MCP security vulnerabilities identified and tested?

We conducted a specification-level security audit of the Model Context Protocol, examining each protocol message type, capability negotiation mechanism, and tool invocation pattern for potential injection vectors. Our analysis covered the full lifecycle of an MCP session: initialization, tool discovery, context assembly, tool invocation, and response processing. For each stage, we identified points where untrusted data could influence model behavior.

We then constructed a taxonomy of attack vectors specific to tool-integrated LLM agents operating over MCP. These included: malicious tool descriptions that embed hidden instructions in tool metadata, poisoned tool responses that inject directives into the model's context, cross-tool escalation attacks where one tool's output manipulates subsequent tool calls, and server impersonation scenarios where a compromised MCP server serves adversarial content.

Each attack vector was validated through proof-of-concept implementations against multiple MCP-compatible agent frameworks. We measured attack success rates, the conditions required for exploitation, and the effectiveness of existing mitigations. Our evaluation spanned both open-source and commercial agent platforms to ensure broad applicability of findings.

What security vulnerabilities were found in the MCP specification?

Attack Vector MCP Stage Impact
Tool description injection Discovery Hidden instructions treated as authoritative by model
Cross-tool context poisoning Invocation Poisoned output cascades through shared context window
Server impersonation Initialization No cryptographic auth enables man-in-the-middle attacks
Capability misrepresentation Negotiation Unauthorized access to sensitive operations
Shadow tool registration Discovery Intercepts calls intended for legitimate tools

Tool description injection is the most broadly effective vulnerability: because MCP tool descriptions are passed directly into the model's context during tool discovery, an attacker controlling a tool server can embed arbitrary instructions that the model treats as authoritative, achieving high success rates across all tested agent platforms. The MCP specification also lacks cryptographic authentication between clients and servers and has exploitable capability negotiation, enabling server impersonation and unauthorized access escalation.

Cross-tool poisoning attacks -- where the output of one tool call contains instructions that alter the model's behavior on subsequent tool calls -- were similarly effective. These attacks exploit the shared context window that tool-integrated agents maintain across multiple tool interactions. The sequential nature of agent reasoning means that poisoned data introduced early in a chain of tool calls can influence all subsequent decisions.

We also identified protocol-level weaknesses in MCP's capability negotiation, where a malicious server can misrepresent its capabilities to gain access to sensitive operations. The absence of cryptographic authentication between MCP clients and servers in the base specification further compounds these risks, as there is no built-in mechanism to verify that a tool server is who it claims to be.

How can MCP deployments be secured against prompt injection?

Fully eliminating prompt injection in tool-integrated agents requires architectural changes beyond protocol-level mitigations alone. We propose tool description sandboxing, output sanitization layers between tool responses and model context, capability-based access control with explicit user approval for sensitive operations, and cryptographic attestation of tool server identity as essential defense components.

The fundamental tension is that the same flexibility that makes MCP useful -- its ability to dynamically discover and invoke arbitrary tools -- also creates a broad attack surface that is difficult to secure. Unlike traditional API protocols where inputs and outputs have well-defined types and validation rules, MCP passes natural language descriptions and responses that the model must interpret, creating an inherent injection surface that cannot be fully addressed at the protocol layer.

We organize our proposed mitigations into three layers of defense. At the protocol layer, cryptographic attestation of tool server identity would prevent server impersonation, and signed tool descriptions would allow clients to verify that tool metadata has not been tampered with. Capability-based access control with explicit user approval for sensitive operations -- such as file writes, network requests, or code execution -- would limit the damage from successful injection attacks even when they cannot be prevented entirely.

At the context layer, tool description sandboxing would isolate tool metadata from the model's primary instruction context, preventing embedded instructions in tool descriptions from being interpreted as directives. Output sanitization layers between tool responses and the model's context window would filter or flag content that resembles injection attempts, though distinguishing legitimate tool output from adversarial content remains an open problem. Structured output schemas for tool responses, where possible, would reduce the surface area for free-text injection.

At the architectural layer, we argue that the most robust defenses require separating the model's planning and execution capabilities. An agent architecture where a "planner" model determines which tools to call and a separate, more constrained "executor" handles the actual invocations would limit the blast radius of injection attacks. However, such architectures introduce latency and complexity that may be impractical for interactive use cases. Until models develop robust instruction-following that is immune to adversarial manipulation of their context, defense-in-depth strategies that combine protocol, context, and architectural mitigations will remain the most practical approach.

What are the emerging attack vectors specific to MCP?

Beyond the general classes of prompt injection, our analysis identified attack vectors that are unique to the MCP protocol architecture. "Shadow tool registration" occurs when a malicious MCP server registers tools with names and descriptions designed to intercept calls intended for legitimate tools. Because MCP clients rely on tool names and descriptions to route invocations, an attacker can register a tool with a similar name to a trusted tool -- or embed instructions in the description that redirect the model's behavior -- effectively hijacking the agent's workflow without the user's knowledge.

Data exfiltration through cross-tool information flow is another vector specific to multi-tool MCP deployments. A compromised tool response can instruct the model to include sensitive information from its context in subsequent tool calls to attacker-controlled servers. In a typical workflow, the agent might read confidential files using a legitimate file-reading tool, then be manipulated by a poisoned response from another tool to pass that content to an external API. The shared context window that makes multi-step reasoning possible also enables information to flow across trust boundaries in ways that neither the user nor the protocol designer intended.

These findings underscore that MCP security cannot be treated as a solved problem through any single mitigation. The protocol's design prioritizes interoperability and developer experience, which are valuable properties, but they come at a security cost that the ecosystem must address through layered defenses, ongoing threat modeling, and community vigilance as the protocol matures.

Related Topics

Prompt Injection in Agentic Coding Assistants · Prompt Injection in Defended Systems · LLM-as-a-Judge Vulnerabilities


Cite as

@article{maloyan2026breaking,
  title={Breaking the Protocol: Security Analysis of the Model Context Protocol Specification and Prompt Injection Vulnerabilities in Tool-Integrated LLM Agents},
  author={Maloyan, Narek and Namiot, Dmitry},
  journal={Modern Information Technologies and IT-education},
  volume={21},
  number={3},
  year={2026}
}


Narek Maloyan is a PhD candidate at Moscow State University and AI Research Engineer at Zencoder. His research focuses on AI safety, LLM security, and adversarial machine learning. Learn more