Declassifying the Responsible Disclosure of the Prompt Injection Attack Vulnerability of GPT-3

Disclosed 05/03/2022. Declassified 09/22/2022.

If you'd like to cite this research, you may cite our paper preprint on arXiv here:
https://arxiv.org/abs/2209.02128

A guide to prompt injection is the following white paper from security firm NCC Group:
Exploring Prompt Injection Attacks by NCC Group
Additional citations - IBM - What are Prompt Injections & Timeline

Prompt Injections continue to be in the news as a major vulnerability with the increased use of LLM models for general purpose tasks and now AI agent capabilities. In the interest of establishing an accurate historical record of the vulnerability and promoting AI security research, we are sharing our experience of a previously private responsible disclosure which Preamble made on May 3rd, 2022 to OpenAI.

May 3,2022 at 4:11pm : The Discovery and Immediate Responsible Disclosure

Document


May 3,2022 at 4:41pm : OpenAI Confirms Receipt of Disclosure

document


May 4,2022: Provided Additional Examples

document


Additional Notes

We originally referred to this new attack as a "command injection" due to the similarities to traditional SQL injection and command injection attacks, since a user could issue commands via a natural language based prompt to override the inherent LLM guardrails of GPT-3. The term "prompt injection" was later coined several months later by AI security researcher - Simon Willison.

Prompt Injections continue to plague generative AI and LLM solutions, through direct and indirect attack methods. AI agents increase the likelihood of being exploited by prompt injections due to the additional API integrations and larger attack surface.