Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems

Authors: N. Maloyan, D. Namiot
Published: International Journal of Open Information Technologies 14 (2), 1-10, 2026
Prompt Injection Coding Assistants AI Security


Abstract

This paper presents a systematic analysis of prompt injection vulnerabilities in agentic coding assistants -- AI-powered development tools that autonomously execute code, manage files, and interact with external services through skills, tools, and protocol integrations. We identify and categorize attack vectors across three distinct layers of the agentic coding stack: skill definitions, tool integrations, and inter-agent protocol ecosystems, demonstrating how each layer introduces unique injection surfaces that compound to create systemic security risks.

Key Findings

Diagram showing prompt injection attack flow across three layers of the agentic coding stack: skill definitions, tool integrations, and protocol ecosystems, with compound attack chains connecting them
Attack vectors across the three layers of the agentic coding stack -- skills, tools, and protocols -- with compound cross-layer chains.

What are agentic coding assistants and why are they vulnerable?

Agentic coding assistants are vulnerable because they process user instructions, skill definitions, tool descriptions, codebase contents, and external service responses in a single shared context window with no enforced trust boundaries between these input streams. An attacker who can influence any stream -- by poisoning a repository, crafting a malicious skill definition, or compromising a tool server -- can hijack the agent's behavior to exfiltrate secrets, introduce backdoors into code, or escalate privileges across the development environment.

These agents have evolved from simple code completion tools into autonomous development agents capable of reading codebases, writing and executing code, running tests, managing version control, and deploying applications. Products like GitHub Copilot Workspace, Cursor, Windsurf, and Claude Code operate with significant autonomy, often executing multi-step workflows without human approval at each step. They rely on extensible architectures built around skills (reusable behavioral templates), tools (functions the agent can invoke), and protocols (standardized interfaces like MCP for connecting to external services).

Unlike traditional software where inputs and outputs are strongly typed and validated, agentic coding assistants process natural language instructions alongside code, configuration, and tool outputs in a shared context. This architecture means that the same extensibility that makes these tools practically useful also creates a complex trust boundary landscape that is fundamentally difficult to secure.

How were prompt injection attacks on coding assistants tested?

We tested attacks across three distinct layers -- skills, tools, and protocol ecosystems -- both independently and in combination, using proof-of-concept exploits against four major platforms (GitHub Copilot Workspace, Cursor, Windsurf, Claude Code). For each layer, we identified trust assumptions, data flow paths, and injection points that an attacker could exploit.

At the skill layer, we analyzed how skill definitions -- markdown files that instruct the agent on specialized behaviors -- can be weaponized through embedded prompt injections that override the agent's safety constraints or redirect its actions. At the tool layer, we examined how tool descriptions, parameter schemas, and return values can carry adversarial payloads that manipulate the agent's reasoning. At the protocol layer, we focused on MCP server interactions, analyzing how compromised or malicious servers can exploit the trust relationship between the agent and its tool ecosystem.

Our evaluation covered four major agentic coding platforms and included both white-box analysis of their architecture and black-box testing of their security boundaries. We measured attack success rates across multiple dimensions: whether the injection was executed, whether it achieved the attacker's objective, and whether the attack was detectable by the user or the platform's built-in safety mechanisms.

How effective are prompt injection attacks across the agentic coding stack?

Our experiments revealed that all three layers of the agentic coding stack present viable injection surfaces, with the severity varying by platform and attack vector. Skill-layer injections proved highly effective because skill definitions are treated as trusted instructions by most platforms -- a poisoned skill file in a repository can persistently alter agent behavior for every developer who uses that repository. We achieved consistent success with skill definitions that contained hidden instructions interleaved with legitimate behavioral guidance.

Tool-layer attacks were most effective when targeting the tool description and response processing stages. Malicious tool descriptions that embedded instructions in seemingly innocuous metadata fields successfully redirected agent behavior in the majority of test cases. Tool response injection -- where a tool's output contains directives intended for the agent rather than the user -- proved particularly dangerous because agents typically process tool outputs with the same trust level as system instructions.

The most concerning findings involved compound attacks that chained vulnerabilities across multiple layers. For example, a malicious MCP server could serve a tool whose response triggers the agent to install a poisoned skill, which then persists across sessions and exfiltrates environment variables through subsequent tool calls. These multi-layer attack chains were difficult to detect and could achieve persistent compromise of the development environment.

What are the real-world implications for development teams?

A new class of software supply chain attacks now operates entirely at the semantic level, bypassing every traditional security control in the development pipeline. These vulnerabilities translate into concrete risks for any development team that incorporates agentic coding assistants into their workflow -- from poisoned repositories that silently exfiltrate API keys, to persistent skill-layer compromises that influence every coding session for every developer on a team.

Consider the supply chain risk posed by poisoned repositories. A malicious contributor -- or a compromised maintainer account -- can embed prompt injection payloads in skill files, configuration comments, or even documentation strings within a public repository. When a developer clones this repository and opens it with an agentic coding assistant, the agent ingests these poisoned files as trusted context. Unlike a malicious npm package that must execute code to cause harm, a poisoned skill file operates passively: it simply waits for the agent to read it, at which point the injected instructions redirect the agent's behavior. The agent might silently exfiltrate environment variables containing API keys, modify security-critical code paths while appearing to perform legitimate refactoring, or install additional persistence mechanisms that survive repository updates.

Persistence across sessions represents another significant concern. Skill-layer injections are particularly dangerous because they modify the agent's behavioral template for the entire duration of its use within a project. A single poisoned skill file can influence every subsequent coding session for every developer on the team. This is fundamentally different from a one-time exploit: the attacker gains persistent influence over the development process without needing to maintain active access to the target environment. In our experiments, poisoned skill definitions remained active across multiple coding sessions until they were explicitly identified and removed.

The risk extends beyond individual developer workstations to CI/CD pipelines. Organizations increasingly deploy agentic coding assistants in automated workflows for tasks like code review, test generation, and deployment scripting. An agent operating in a CI/CD pipeline typically has elevated privileges -- access to deployment credentials, production secrets, and infrastructure configuration. A prompt injection that reaches an agent in this context can cause far more damage than one targeting a developer's local environment. The agent might modify deployment scripts to include backdoors, alter test assertions to pass despite security vulnerabilities, or exfiltrate secrets to attacker-controlled endpoints during what appears to be a routine build process.

Traditional security scanning tools are fundamentally unable to detect these attacks. Static analysis, dependency scanning, secret detection, and even behavioral sandboxing all operate on the assumption that threats manifest as executable code or known vulnerability patterns. Prompt injection payloads are natural language instructions embedded in files that would pass any code scanner -- they contain no executable code, no known CVE patterns, and no suspicious network calls. The malicious behavior only emerges when an AI agent interprets the text as instructions. This creates a blind spot in the security posture of organizations that have otherwise mature application security programs. A repository can pass every automated security check while containing prompt injection payloads that compromise the development process the moment an agentic assistant processes it.

How can agentic coding assistants defend against prompt injection?

Addressing these vulnerabilities requires a security-first redesign of the agentic coding stack rather than retroactive patching of individual attack vectors. We recommend explicit trust labeling of all context sources with enforcement at the model level, mandatory user confirmation for high-privilege operations regardless of instruction source, content-based anomaly detection for skill and tool definitions, and protocol-level authentication and integrity verification for MCP and similar tool integration standards.

The core challenge is that these systems must process heterogeneous inputs -- user instructions, skill definitions, codebase contents, tool descriptions, and external service responses -- in a unified context, yet assign appropriate trust levels to each input source. Current approaches largely fail to enforce meaningful trust boundaries between these input streams, and the current generation of agentic coding assistants has not adequately addressed the security implications of their extensible architectures.

How does this compare to traditional software supply chain attacks?

Prompt injection attacks on coding assistants are harder to detect and potentially more dangerous than traditional supply chain attacks because they operate at the semantic level rather than the code execution level. The attack payload is natural language text that contains no executable code, no known CVE patterns, and no suspicious network calls -- making it invisible to every traditional security scanner. The agent itself becomes the execution engine, carrying out the attacker's instructions using its legitimate capabilities, so malicious activity is indistinguishable from normal operation at every layer below the semantic one.

Traditional supply chain attacks (typosquatting on npm, malicious PyPI packages, compromised GitHub Actions) must contain executable code that leaves analyzable artifacts -- suspicious system calls, network connections to unknown endpoints, file system modifications, or anomalous process behavior. Prompt injection payloads face no such constraint. From the perspective of system-level monitoring, there is no distinction between the agent writing code as instructed by the developer and the agent writing code as instructed by an injected prompt.

This semantic-level operation also changes the economics of attack and defense. Traditional supply chain attacks require the attacker to write functional exploit code, which can be reverse-engineered and signatured for future detection. Prompt injection payloads are trivially cheap to produce -- they require only natural language text -- and are inherently polymorphic, since the same objective can be expressed in countless phrasings that evade pattern-based detection. An attacker who discovers that a particular injection phrasing has been blocked can simply rephrase the instruction, whereas an attacker whose compiled malware is signatured must develop an entirely new payload. This asymmetry between attack cost and defense cost is characteristic of prompt injection as a vulnerability class and suggests that the security challenges facing agentic coding assistants will persist even as platform defenses mature.

Related Topics

MCP Protocol Security Analysis · Prompt Injection in Defended Systems · Adversarial Attacks on LLM Judges


Cite as

@article{maloyan2026prompt,
  title={Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems},
  author={Maloyan, Narek and Namiot, Dmitry},
  journal={International Journal of Open Information Technologies},
  volume={14},
  number={2},
  pages={1--10},
  year={2026}
}


Narek Maloyan is a PhD candidate at Moscow State University and AI Research Engineer at Zencoder. His research focuses on AI safety, LLM security, and adversarial machine learning. Learn more