Llm Safety Patterns
Helps with LLMs, patterns as part of building AI and machine learning applications workflows.
Install on your platform
We auto-selected Claude Code based on this skill’s supported platforms.
Run in terminal (recommended)
claude mcp add llm-safety-patterns npx -- -y @trustedskills/llm-safety-patterns
Or manually add to ~/.claude/settings.json
{
"mcpServers": {
"llm-safety-patterns": {
"command": "npx",
"args": [
"-y",
"@trustedskills/llm-safety-patterns"
]
}
}
}Requires Claude Code (claude CLI). Run claude --version to verify your install.
About This Skill
What it does
This skill provides a collection of patterns designed to improve the safety and reliability of large language model (LLM) outputs. It helps mitigate risks associated with LLMs, such as generating harmful or biased content, by incorporating specific instructions and constraints into prompts. The patterns are intended for use within orchestration workflows to enhance agent behavior and ensure responsible AI interactions.
When to use it
- Content Moderation: Before publishing user-generated content created by an LLM, apply safety patterns to filter out potentially harmful or inappropriate material.
- Bias Mitigation: Use the skill when generating responses on sensitive topics (e.g., politics, religion) to reduce the likelihood of biased outputs.
- Roleplaying with Constraints: When instructing an agent to roleplay a specific persona, use safety patterns to prevent it from engaging in behaviors outside acceptable boundaries.
- Generating Code: Apply safety patterns when generating code snippets to avoid introducing security vulnerabilities or malicious instructions.
Key capabilities
- Collection of pre-defined LLM safety patterns
- Integration into orchestration workflows
- Mitigation of harmful content generation
- Reduction of bias in LLM outputs
- Enforcement of behavioral constraints on agents
Example prompts
- "Apply the 'refusal' pattern to this prompt: [user prompt]"
- "Use the 'constitutional-ai' safety pattern when generating a response about [topic]."
- "Incorporate the 'jailbreak resistance' patterns into this roleplay scenario."
Tips & gotchas
The effectiveness of these patterns depends on the specific LLM being used and the complexity of the task. Experimentation is encouraged to determine which patterns yield the best results for your particular application.
Tags
TrustedSkills Verification
Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.
Security Audits
| Gen Agent Trust Hub | Pass |
| Socket | Pass |
| Snyk | Pass |
🌐 Community
Passed automated security scans.