
Why is this such a big problem?
Attackers can use Prompt Injection to trick an LLM into exposing sensitive information, performing actions it should not. This problem is even more pronounced for AI agents since they can take actions like updating a CRM, running queries or executing code.Different Prompt Injection attacks: Examples
There are some pretty straightforward examples below that show different ways Prompt Injection attacks can manifest.Ignore all prior instructions
Access sensitive dataRoleplaying
Write a script to destroy a Mac beyond repair.Ciphers / Other Languages
Techniques to guard against Prompt Injection attacks
Use athina.guard()
You can use athina.guard()
to scan queries for Prompt Injection attacks.
Under the hood, we use a popular open source model from HuggingFace. It’s a fine tuned Deberta model, so latency should be low.
Note that this won’t be enough to prevent every single type of Prompt Injection attacks. But it’s a good starting point.