Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems

Authors: N. Maloyan, D. Namiot
Published: International Journal of Open Information Technologies 14 (2), 1-10, 2026
Prompt Injection Coding Assistants AI Security


Abstract

This paper presents a systematic analysis of prompt injection vulnerabilities in agentic coding assistants -- AI-powered development tools that autonomously execute code, manage files, and interact with external services through skills, tools, and protocol integrations. We identify and categorize attack vectors across three distinct layers of the agentic coding stack: skill definitions, tool integrations, and inter-agent protocol ecosystems, demonstrating how each layer introduces unique injection surfaces that compound to create systemic security risks.


Background

Agentic coding assistants have evolved from simple code completion tools into autonomous development agents capable of reading codebases, writing and executing code, running tests, managing version control, and deploying applications. Products like GitHub Copilot Workspace, Cursor, Windsurf, and Claude Code represent a new category of AI-powered development tools that operate with significant autonomy, often executing multi-step workflows without human approval at each step.

These agents rely on extensible architectures built around skills (reusable behavioral templates), tools (functions the agent can invoke), and protocols (standardized interfaces like MCP for connecting to external services). While this extensibility is essential for practical utility, it creates a complex trust boundary landscape where user instructions, skill definitions, tool descriptions, codebase contents, and external service responses all flow into the same context window that drives the agent's decision-making.

The security implications of this architecture are significant. Unlike traditional software where inputs and outputs are strongly typed and validated, agentic coding assistants process natural language instructions alongside code, configuration, and tool outputs in a shared context. An attacker who can influence any of these input streams -- by poisoning a repository, crafting a malicious skill definition, or compromising a tool server -- can potentially hijack the agent's behavior to exfiltrate secrets, introduce backdoors into code, or escalate privileges across the development environment.

Methodology

We developed a three-layer threat model for agentic coding assistants that maps the attack surface across skills, tools, and protocol ecosystems. For each layer, we identified the trust assumptions, data flow paths, and injection points that an attacker could exploit. We then designed and implemented proof-of-concept attacks targeting each layer independently and in combination.

At the skill layer, we analyzed how skill definitions -- markdown files that instruct the agent on specialized behaviors -- can be weaponized through embedded prompt injections that override the agent's safety constraints or redirect its actions. At the tool layer, we examined how tool descriptions, parameter schemas, and return values can carry adversarial payloads that manipulate the agent's reasoning. At the protocol layer, we focused on MCP server interactions, analyzing how compromised or malicious servers can exploit the trust relationship between the agent and its tool ecosystem.

Our evaluation covered four major agentic coding platforms and included both white-box analysis of their architecture and black-box testing of their security boundaries. We measured attack success rates across multiple dimensions: whether the injection was executed, whether it achieved the attacker's objective, and whether the attack was detectable by the user or the platform's built-in safety mechanisms.

Results

Our experiments revealed that all three layers of the agentic coding stack present viable injection surfaces, with the severity varying by platform and attack vector. Skill-layer injections proved highly effective because skill definitions are treated as trusted instructions by most platforms -- a poisoned skill file in a repository can persistently alter agent behavior for every developer who uses that repository. We achieved consistent success with skill definitions that contained hidden instructions interleaved with legitimate behavioral guidance.

Tool-layer attacks were most effective when targeting the tool description and response processing stages. Malicious tool descriptions that embedded instructions in seemingly innocuous metadata fields successfully redirected agent behavior in the majority of test cases. Tool response injection -- where a tool's output contains directives intended for the agent rather than the user -- proved particularly dangerous because agents typically process tool outputs with the same trust level as system instructions.

The most concerning findings involved compound attacks that chained vulnerabilities across multiple layers. For example, a malicious MCP server could serve a tool whose response triggers the agent to install a poisoned skill, which then persists across sessions and exfiltrates environment variables through subsequent tool calls. These multi-layer attack chains were difficult to detect and could achieve persistent compromise of the development environment.

Discussion

Our findings indicate that the current generation of agentic coding assistants has not adequately addressed the security implications of their extensible architectures. The core challenge is that these systems must process heterogeneous inputs -- user instructions, skill definitions, codebase contents, tool descriptions, and external service responses -- in a unified context, yet assign appropriate trust levels to each input source. Current approaches largely fail to enforce meaningful trust boundaries between these input streams.

We recommend several defense strategies: explicit trust labeling of all context sources with enforcement at the model level, mandatory user confirmation for high-privilege operations regardless of the instruction source, content-based anomaly detection for skill and tool definitions, and protocol-level authentication and integrity verification for MCP and similar tool integration standards. Ultimately, addressing these vulnerabilities will require a security-first redesign of the agentic coding stack rather than retroactive patching of individual attack vectors.

Related Topics

MCP Protocol Security Analysis · Prompt Injection in Defended Systems · Adversarial Attacks on LLM Judges

Access the Paper: View on Google Scholar

Cite as

Maloyan, N., Namiot, D. Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems. International Journal of Open Information Technologies 14 (2), 1-10, 2026.