Llm Safety Patterns

🌐Community
by yonatangross · vlatest · Repository

Helps with LLMs, patterns as part of building AI and machine learning applications workflows.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

1

Run in terminal (recommended)

terminal
claude mcp add llm-safety-patterns npx -- -y @trustedskills/llm-safety-patterns
2

Or manually add to ~/.claude/settings.json

~/.claude/settings.json
{
  "mcpServers": {
    "llm-safety-patterns": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/llm-safety-patterns"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill provides a collection of patterns designed to improve the safety and reliability of large language model (LLM) outputs. It helps mitigate risks associated with LLMs, such as generating harmful or biased content, by incorporating specific instructions and constraints into prompts. The patterns are intended for use within orchestration workflows to enhance agent behavior and ensure responsible AI interactions.

When to use it

  • Content Moderation: Before publishing user-generated content created by an LLM, apply safety patterns to filter out potentially harmful or inappropriate material.
  • Bias Mitigation: Use the skill when generating responses on sensitive topics (e.g., politics, religion) to reduce the likelihood of biased outputs.
  • Roleplaying with Constraints: When instructing an agent to roleplay a specific persona, use safety patterns to prevent it from engaging in behaviors outside acceptable boundaries.
  • Generating Code: Apply safety patterns when generating code snippets to avoid introducing security vulnerabilities or malicious instructions.

Key capabilities

  • Collection of pre-defined LLM safety patterns
  • Integration into orchestration workflows
  • Mitigation of harmful content generation
  • Reduction of bias in LLM outputs
  • Enforcement of behavioral constraints on agents

Example prompts

  • "Apply the 'refusal' pattern to this prompt: [user prompt]"
  • "Use the 'constitutional-ai' safety pattern when generating a response about [topic]."
  • "Incorporate the 'jailbreak resistance' patterns into this roleplay scenario."

Tips & gotchas

The effectiveness of these patterns depends on the specific LLM being used and the complexity of the task. Experimentation is encouraged to determine which patterns yield the best results for your particular application.

Tags

🛡️

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust HubPass
SocketPass
SnykPass

Details

Version
vlatest
License
Author
yonatangross
Installs
15

🌐 Community

Passed automated security scans.