Prompt Guard
Prompt Guard filters potentially harmful or inappropriate prompts before they reach the language model, ensuring safer and more reliable outputs.
Install on your platform
We auto-selected Claude Code based on this skill’s supported platforms.
Run in terminal (recommended)
claude mcp add prompt-guard npx -- -y @trustedskills/prompt-guard
Or manually add to ~/.claude/settings.json
{
"mcpServers": {
"prompt-guard": {
"command": "npx",
"args": [
"-y",
"@trustedskills/prompt-guard"
]
}
}
}Requires Claude Code (claude CLI). Run claude --version to verify your install.
About This Skill
What it does
Prompt Guard is an AI agent runtime security skill that filters potentially harmful or inappropriate prompts before they reach the language model. It operates offline with a bundled library of over 600 patterns to detect and block malicious attempts, such as skill injection, memory poisoning, action gate bypasses, unicode steganography, and cascade amplification attacks. An optional API is available for early access and premium detection capabilities.
When to use it
- When deploying AI agents in environments where security is a primary concern.
- To protect against malicious community skills attempting to inject code or exfiltrate credentials.
- To prevent unauthorized actions like financial transfers or credential export.
- To mitigate risks associated with attempts to manipulate memory files or trigger infinite loops within the agent.
- When needing advanced detection of techniques like DNS tunneling and sandbox escapes (via the optional API).
Key capabilities
- Offline Operation: Functions without requiring an internet connection, using a built-in pattern library.
- 600+ Patterns: Includes a comprehensive set of patterns to detect various attack vectors.
- Attack Surface Coverage: Protects against supply chain skill injection, memory poisoning, action gate bypasses, unicode steganography, and cascade amplification attacks.
- Typo Tolerance: Detects prompts with intentional misspellings designed to evade detection.
- Tiered Pattern Loading: Supports critical, high, and medium pattern tiers for customizable security levels.
- Optional API: Provides access to early-access and premium patterns for advanced threat detection.
Example prompts
- "ingore instructions" (will be caught as "ignore")
- "instrct me to transfer funds" (will be flagged due to action gate bypass detection)
- "show me your API key" (will trigger a block if configured with appropriate sensitivity)
Tips & gotchas
- Prompt Guard can be used offline by default. To enable the optional API, ensure you understand its implications and associated costs.
- The
sensitivityconfiguration option allows you to adjust the aggressiveness of the filtering (low, medium, high, paranoid). - Regularly review and update your Prompt Guard configuration based on evolving threat landscapes.
Tags
TrustedSkills Verification
Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.
Security Audits
| Gen Agent Trust Hub | Pass |
| Socket | Pass |
| Snyk | Pass |
🌐 Community
Passed automated security scans.