Excessive Agency Vulnerability in LLM

Play AI LLM Labs on this vulnerability with SecureFlag!

Excessive Agency Vulnerability in LLM

Description

In LLM-based systems, developers often give the model some leeway when interacting with other systems and performing actions based on input prompts. This might include letting it decide which functions to call, using plugins, tools, or extensions, depending on the user’s request or the model’s own previous output.

The problems happen when that leeway goes too far. Excessive Agency occurs when the model carries out harmful actions due to unexpected, ambiguous, or manipulated outputs. This vulnerability may stem from hallucinations, direct or indirect prompt injection, or general model misbehavior.

The common root causes include giving the model too many capabilities, excessive access, or too much autonomy without enough human oversight. This can lead to serious risks across confidentiality, integrity, and availability.

Impact

Excessive Agency can cause a wide range of issues around confidentiality, integrity, and availability. It might cause unauthorized data changes, expose sensitive information, disrupt services, or execute actions the system wasn’t meant to take. The severity of the impact usually depends on the permissions and access levels the LLM has and what other systems it interacts with.

Scenarios

Take a financial institution that uses an LLM-powered application to manage accounts, including generating personalized investment advice and portfolio reports. They’ve integrated a third-party plugin for recommendations, but it includes unused functionality, like the ability to make payments.

During regular use, the LLM receives malformed prompts or manipulated input, which leads it to invoke the payment feature based on a misinterpreted intent. As a certain amount of leeway has been granted to the model (which still has unused but active plugin capabilities), unauthorized fund transfers are made.

This example shows how giving models excessive functionality and permissions can be exploited. It’s a reminder to apply strict controls, implement least privilege, and include human approval when integrating plugins to LLM-based systems.

Prevention

Limit plugin functionality: Restrict plugin permissions so that LLM agents can only access the specific functions they need. Avoid giving extra capabilities that might be misused.
Avoid open-ended functions: Use plugins with focused, limited functions. Avoid broad operations like shell commands or open URL fetching to reduce the attack surface.
Track user authorization: Ensure actions performed on behalf of a user are done with the minimum necessary privileges within that user’s security scope to prevent unauthorized actions.
Human-in-the-loop control: Require human approval before executing any actions, either inside the plugin or in downstream systems, to improve oversight and reduce risk.
Authorization in downstream systems: Set up authorization in downstream systems instead of depending only on the LLM to decide if an action is allowed. This helps cut down the risk of unauthorized access and actions.
Logging and monitoring: Monitor the plugin and downstream activity to quickly detect and respond to any suspicious or unauthorized actions.
Rate-limiting: Implement rate-limiting to reduce the number of malicious actions within a given time period, enabling early detection and response to suspicious activities.

References

OWASP - Top 10 for LLMs