Link Search Menu Expand Document

Supply Chain Vulnerabilities in LLM

Play SecureFlag Play AI LLM Labs on this vulnerability with SecureFlag!

LLM applications increasingly rely on third-party tools, plugins, prompt templates, datasets, other agents, and agent-to-agent interfaces such as the model context protocol (MCP) or Agent2Agent protocol (A2A).

Many of these components are fetched or discovered dynamically at runtime, meaning the agent’s supply chain includes not only what is shipped but also whatever it learns to trust while running. Any compromise in this chain can silently alter agent behavior, leak data, or trigger harmful actions.

Third-party Components

As with any software supply chain, vulnerabilities can arise when upstream components are tampered with, whether in source code, binaries, or container images. Agentic systems are especially vulnerable because they often rely on externally provided sources for prompts, tools, and datasets.

To prevent this, maintain a software bill of materials (SBOM) or an AI bill of materials (AIBOM). Also, enforce staged rollouts with differential testing, and automatically roll back when hashes drift or behavior changes unexpectedly.

Outdated or Malicious Model

Agents often rely on LLMs provided by third-party vendors. Collaborative model repositories or open-source models are easy to access but may contain backdoored models or versions with known vulnerabilities. If the model is outdated, compromised, or maliciously altered, it can produce harmful outputs, leak sensitive data, or misinterpret instructions.

To mitigate this, always source models from trusted vendors, verify model integrity through checksums or signatures, and monitor for unusual behavior that may indicate a compromised model.

Tool Poisoning

The strongest point of agentic frameworks like MCP is dynamic discovery and loading of new capabilities. However, this also opens the door to supply chain attacks where malicious or tampered tools are injected into the agent’s ecosystem.

A tool loaded from a remote server may seem legitimate, but its descriptor or MCP metadata contains hidden payloads that manipulate the agent’s decision-making or tool selection.

To prevent this, require signed tool descriptors, validate metadata schemas, and re-verify signatures whenever tools are loaded or refreshed.

Poisoned Prompts

More mature LLM applications may automatically fetch prompt templates from external registries to guide agent behavior. A tampered template embeds hidden instructions that cause the agent to exfiltrate sensitive data or perform destructive actions.

Think of a coding assistant that fetches a README template from a public repo. The template includes a subtle Prompt Injection that causes the agent to insert backdoors into generated code.

To mitigate this, treat remote templates as untrusted until verified. Limit the sources of prompt templates to trusted repositories, and in mature environments, sign and hash all prompt templates, pin versions, and reject unsigned or drifting content at runtime.

Poisoned Datasets

Models and knowledge-based RAG systems often rely on external datasets to augment their capabilities. If these datasets are poisoned with malicious content, the agent may learn harmful behaviors or produce incorrect outputs. For example, a customer support agent using a knowledge base that has been tampered with to include misleading information could provide incorrect advice to users.

To prevent this, validate datasets for integrity and authenticity before use. Employ data provenance techniques to track the origin of datasets and monitor for unexpected changes or anomalies in the data.