Use of Hard-Coded Secrets in LLM

Play AI LLM Labs on this vulnerability with SecureFlag!

LLM applications can unintentionally disclose confidential information through their outputs. This may include sensitive training data, secrets provided at runtime, or protected resources exposed through poorly secured integrations. Such leaks can result in privacy violations, intellectual-property loss, regulatory penalties, and long-term erosion of user trust.

Training Data Leakage

The model has been trained on knowledge sources that were not properly redacted or validated, which allows it to reproduce personal, regulated, or proprietary information in its responses.

To mitigate this risk, all training corpora should pass through strict sanitization and redaction pipelines with automated PII detection, and production databases should never be used directly for training without anonymization and formal review.

Prompt Injection Leakage

An attacker injects malicious instructions or embeds sensitive values into prompts, causing the model to reveal internal context, system instructions, or confidential data provided earlier in the conversation.

This can be reduced by separating system instructions from user input, applying pattern-based filtering to prompts, and enforcing guardrails that prevent the model from disclosing restricted internal context.

Agent / Secret Leakage

Agentic or autonomous systems mistakenly pass API keys, credentials, or internal secrets directly into the model context, where they may be echoed back in responses or captured in logs.

Secrets should instead be stored in secure vaults and injected only at execution layers outside the LLM, with the model receiving only abstract references or short-lived tokens.

Unauthenticated or Unauthorized Data Access

LLM APIs, MCP servers, or data backends are deployed without proper authentication or authorization controls, enabling attackers to query or extract sensitive information.

All model-adjacent services must enforce strong authentication, role-based authorization, and network-level protections.

RAG Exploitation via Vectors and Embeddings

In Retrieval Augmented Generation systems, attackers can abuse vector similarity mechanisms and embedding models to infer or reconstruct sensitive source documents by issuing carefully crafted queries that progressively surface protected content. Over time, this can enable the extraction of private records or proprietary material even without direct access to the underlying data store.

This risk can be mitigated by restricting which documents are eligible for embedding, encrypting or obfuscating sensitive content before vectorization, applying access controls and query-rate limits on vector search endpoints, and monitoring for abnormal similarity-query patterns indicative of extraction attacks.

References

OWASP - Top 10 for LLMs