Model Denial of Service Vulnerability in LLM
Description
Due to the nature of large language models, an attacker can interact with the LLM in a fashion that causes high consumption of resources, causing a degradation of service, availability, and quality of responses for them and other users of the service. More sophisticated attacks can be done by abusing the context window, which is the maximum amount of text the model can consider at a single time when generating responses.
Impact
Denial of Service attacks can significantly degrade service performance, making it slow or entirely unavailable to legitimate users. It can also escalate operational costs due to increased computational resource consumption and necessitate additional spending on security measures and system recovery efforts.
Scenarios
Imagine an LLM-powered travel assistant in an airport is used for monitoring flight data for the benefit of customers getting real-time updates on flights, such as delays, updates, and cancellations.
In this scenario, an attacker could craft and submit an oversized request to the assistant, causing over-utilization of resources and degrading the service quality. This would prevent flight updates from being processed or communicated too late, resulting in customers missing their flights.
Prevention
- API rate limiting - Implement a limit on the number of requests that can come from unique user sessions or IP addresses within a certain timeframe.
- Sanitation and validation - Utilize filters and validation techniques to prevent malicious inputs.
- Step and query-based resource cap - For complex queries involving responses in steps or stages, generate responses slowly to prevent overuse of resources.
- Monitor - Monitor for usage spikes that might indicate DoS and nefarious activity, while also making developers aware of types of methods that are used for targeting LLM for DoS.