Supply Chain Vulnerabilities Vulnerability in LLM
Description
Risks with the supply chain in traditional software usually focus on vulnerabilities from third-party software components or any other stage in the software development lifecycle that depends on external components, from coding and testing to the deployment stages.
However, with Machine Learning and LLM comes the additional risk of training data sourced from public and third-party vendors, pre-trained models, and LLM plugin extensions, opening up the potential for poisoning or tampering attacks.
Impact
Supply chain vulnerabilities can severely compromise the integrity and security of LLM applications. If attackers manage to poison the data or components used in training, the LLM could be of inferior quality or degrade the other security controls, such as fine-tuned alignments. Insecure or malicious plugins can lead to data breaches and unauthorized access to sensitive information.
Scenarios
LLM applications often use software components, called plugins, that the model automatically calls during user interactions. They are used to interact with the external environment, for example, to ingest live data from websites or APIs.
Given that external developers commonly create plugins, they represent a significant vulnerability within the supply chain. This vulnerability could be exploited by attackers or malicious maintainers who might embed harmful code within these plugins. Such actions compromise the security and integrity of any LLM application relying on these components, potentially resulting in data breaches once the malicious code runs in the LLM application.
Prevention
- Patch and update out-of-date components in LLM apps and systems.
- Evaluate the security of the third-party components, including plugins and libraries used by your application, and host them within code repositories you manage.
- Inspect training data and ensure data integrity by searching for any falsified or malicious input, preventing training data poisoning.
- Maintain a record of the inventory of components with methods such as a Software Bill of Materials (SBOM).
- Use code signing when handling external data and models to confirm data integrity.
- Audit access for suppliers to the data and model to maintain a high-security posture.