Memory & Context Poisoning Vulnerability in LLM
Description
Agentic systems often rely on stored and retrievable context to stay consistent across tasks. This can include conversation history, summaries, long-term memory, embeddings, and RAG/vector database content.
Memory and Context Poisoning happens when an attacker gets malicious or misleading information into the stored context. Later, the agent retrieves it and treats it as trusted truth, which can skew reasoning, planning, and tool use even after the original attack message is gone.
With memory poisoning, the danger is persistence: the bad data remains and keeps affecting future sessions and actions.
Impact
Memory poisoning can cause long-term problems, such as:
- Wrong decisions: The agent makes decisions based on stored facts and misinformation.
- Unsafe tool actions: The agent uses poisoned context to justify refunds, purchases, approvals, or other operations.
- Data leakage over time: Sensitive information is stored in memory and later exposed in other sessions.
- Policy drift and behavior changes: The agent’s accuracy and behavior degrade due to poisoned context.
- Cross-user and cross-tenant exposure: Information from one user or tenant can leak into another user’s retrieval.
The impact is usually worse when memory is shared, not segmented, or treated as automatically trusted.
Scenarios
A travel booking assistant uses long-term memory and a vector database to “remember” user preferences and recent prices. An attacker repeatedly inserts a fake flight price through normal chats and gets it stored in memory. Later, when the assistant searches its memory, it retrieves the fake price and approves bookings based on it, bypassing expected checks.
In another case, a shared memory store is used across teams or tenants. An attacker uploads near duplicate content designed to match another tenant’s data. Because namespace filtering is weak, retrieval pulls sensitive chunks from the wrong tenant due to high similarity.
Prevention
-
Protect memory data at the base level: Encrypt memory/RAG data in transit and at rest, and restrict access with least privilege.
-
Validate before writing to memory: Scan new memory entries and summaries to detect malicious instructions, sensitive data, or suspicious patterns before saving.
-
Segment memory properly: Isolate memory by user, tenant, and domain. Avoid shared memory unless it’s strictly controlled.
-
Control sources and retention: Only allow authenticated/curated sources to write to trusted memory. Keep memory as short-lived as possible based on data sensitivity.
-
Track provenance and detect anomalies: Store where each memory item came from and alert on unusual update rates or unexpected sources.
-
Don’t auto-trust the agent’s own outputs: Avoid automatically re-ingesting model-generated text into trusted memory (prevents self-reinforcing “bootstrap poisoning”).
-
Add resilience: Use snapshots/versioning, rollback, quarantine for suspicious entries, and red-team testing focused on memory poisoning.
-
Expire unverified memory: Automatically decay or remove items that aren’t confirmed, so poison doesn’t persist.
-
Weight retrieval by trust and tenancy: Prefer content with higher trust/provenance, and enforce strict tenant boundaries so “similarity” can’t override access controls.