Prompt Injection & Jailbreaking
Prompt injection is the SQL injection of the AI era. Attackers craft malicious input — embedded in user messages, documents, web pages, or API responses — that overrides an AI system's original instructions, causing it to leak data, bypass safeguards, or take unauthorized actions.
For AI agents with tool access (file systems, email, APIs), a single injected instruction can cause an agent to exfiltrate sensitive documents, send unauthorized emails, or execute arbitrary code — all while appearing to follow legitimate user intent.
- Enforce strict input/output validation between user data and model context
- Apply least-privilege to AI agent tool access — no agent should have broader permissions than its task requires
- Use separate privileged/unprivileged context channels to isolate system instructions
- Implement human-in-the-loop approval gates for high-stakes agent actions