context poisoning
A state in LLMs where long or poorly controlled conversations can alter the model's internal representation, causing its reasoning to become 'off-track' and potentially leading the user down a delusional path.
entitydetail.created_at
8/16/2025, 2:37:23 AM
entitydetail.last_updated
8/16/2025, 2:38:34 AM
entitydetail.research_retrieved
8/16/2025, 2:38:34 AM
Summary
Context poisoning is a phenomenon in artificial intelligence, particularly affecting large language models (LLMs), where incorrect, fabricated, or misleading information is intentionally introduced into the AI's context window or training data. This manipulation can lead to corrupted outputs, errors, and suboptimal performance. It has been identified as a technical cause of "AI Psychosis," a delusional state in LLMs, and was specifically detailed by David Friedberg on the All-In Podcast.
Referenced in 1 Document
Research Data
Extracted Attributes
Field
Artificial Intelligence, Machine Learning, Cybersecurity
Causes
Intentional malicious input or data manipulation.
Mechanism
Manipulates document interpretation by inserting false system-level instructions and metadata, or by crafting malicious content to compel the LLM to produce certain responses.
Definition
A type of adversarial AI attack where incorrect, fabricated, or misleading information is intentionally introduced into an AI model's context window or training data, leading to corrupted outputs and errors.
Consequences
Corrupted outputs, future errors, suboptimal AI performance, AI Psychosis, jailbreaking, generation of harmful content, data exfiltration.
Related Concepts
Data poisoning, adversarial AI, feedback loops, hallucinations, context engineering, RAG (Retrieval Augmented Generation).
Mitigation Strategies
Context engineering (writing, selecting, compressing, isolating context), data validation.
Timeline
- Researchers exploited ChatGPT's browsing capabilities by poisoning the RAG context with malicious content from untrusted websites, enabling actions like invoking DALL-E image generation without consent or manipulating ChatGPT's memory. (Source: web_search_results)
2024-05
- The 'Echo Chamber Attack,' a context-poisoning jailbreak method, was published, demonstrating how subtle inputs can progressively shape a model's internal context to produce harmful outputs. (Source: web_search_results)
2025-06-23
- Discussed on the All-In Podcast as a technical cause of AI Psychosis. (Source: d21d43bf-4b55-4adb-9584-8c298d6baf45)
Unknown
Wikipedia
View on WikipediaModel Context Protocol
The Model Context Protocol (MCP) is an open standard, open-source framework introduced by Anthropic in November 2024 to standardize the way artificial intelligence (AI) systems like large language models (LLMs) integrate and share data with external tools, systems, and data sources. MCP provides a universal interface for reading files, executing functions, and handling contextual prompts. Following its announcement, the protocol was adopted by major AI providers, including OpenAI and Google DeepMind.
Web Search Results
- Context Engineering: The Critical AI Skill that makes or breaks your ...
Even with robust strategies, context management can fail, leading to suboptimal AI performance. Below are four common context fails, their definitions, examples, and mitigation strategies. # 1. Context Poisoning Definition: Context poisoning occurs when incorrect or fabricated information (hallucinations) enters the context window, leading to future errors in the AI’s responses. [...] This guide explores context engineering in depth, detailing its core strategies — writing, selecting, compressing, and isolating context — and addressing common pitfalls like context poisoning, distraction, confusion, and clash. Through practical examples, including implementations in tools like LangChain, LangGraph, and Claude, and supported by code snippets, this article aims to provide a comprehensive resource for developers and enthusiasts alike. # Context Engineering: The Foundation [...] Context engineering is a transformative approach to optimizing AI agent performance, enabling LLMs to handle complex tasks with precision and efficiency. By mastering the strategies of writing, selecting, compressing, and isolating context, developers can ensure AI agents operate within the constraints of their context windows while delivering accurate and relevant responses. Understanding and mitigating context fails like poisoning, distraction, confusion, and clash further enhances
- RAG Data Poisoning: Key Concepts Explained
2. Context Poisoning: Manipulates document interpretation by inserting false system-level instructions and metadata. Creates fake authority through administrative directives that the AI prioritizes. For example: [...] What is Data Poisoning? ----------------------------------------------------------------------------------------------------------------------------------------- Data poisoning works by exploiting how AI systems, especially RAGs, trust their context. When context is built from external documents, this forms an attack vector. Attackers first ensure their malicious content will be retrieved for specific queries, then craft that content to compel the LLM to produce certain responses. [...] In May 2024, researchers exploited ChatGPT's browsing capabilities by poisoning the RAG context with malicious content from untrusted websites. This allowed the attacker to: Automatically invoke DALL-E image generation without user consent Access and manipulate ChatGPT's memory system Execute actions based on untrusted website content ### Slack AI Data Exfiltration
- Echo Chamber: A Context-Poisoning Jailbreak That ...
The Echo Chamber Attack is a context-poisoning jailbreak that turns a model’s own inferential reasoning against itself. Rather than presenting an overtly harmful or policy-violating prompt, the attacker introduces benign-sounding inputs that subtly imply unsafe intent. These cues build over multiple turns, progressively shaping the model’s internal context until it begins to produce harmful or noncompliant outputs. [...] Published Time: 2025-06-23T13:30:00Z Echo Chamber: A Context-Poisoning Jailbreak That Bypasses LLM Guardrails | NeuralTrust =============== News 🚨 NeuralTrust uncovers major LLM vulnerability: Echo Chamber . Dubbed the Echo Chamber Attack, this method leverages context poisoning and multi-turn reasoning to guide models into generating harmful content, without ever issuing an explicitly dangerous prompt. [...] At this point, the attacker selectively picks a thread from the poisoned context that aligns with the original objective. Rather than stating the harmful concept outright, they reference it obliquely—for example, by asking the model to expand on a specific earlier point or to continue a particular line of reasoning.
- How to Fix Your Context - Drew Breunig
Context Poisoning: When a hallucination or other error makes it into the context, where it is repeatedly referenced. Context Distraction: When a context grows so long that the model over-focuses on the context, neglecting what it learned during training. Context Confusion: When superfluous information in the context is used by the model to generate a low-quality response.
- What Is Data Poisoning? - CrowdStrike
Charlotte-AI-Thumbnail-e1710183624698 #### Assessing Potential Attacks with Charlotte AI Watch this demo and understand how Charlotte AI helps analysts get the context they need to assess potential attacks like Log4j and stop breaches. ## Data poisoning defense best practices Some data poisoning best practices include: ### Data validation [...] Data poisoning is a type of cyberattack in which an adversary intentionally compromises a training dataset used by an AI or machine learning (ML) model to influence or manipulate the operation of that model. Data poisoning can be done in several ways: [...] By manipulating the dataset during the training phase, the adversary can introduce biases, create erroneous outputs, introduce vulnerabilities (i.e., backdoors), or otherwise influence the decision-making or predictive capabilities of the model. Data poisoning falls into a category of cyberattacks known as adversarial AI. Adversarial AI or adversarial ML is any activity that seeks to inhibit the performance of AI/ML systems by manipulating or misleading them.