Llm Safety Patterns

Name: Llm Safety Patterns
Author: yonatangross

🌐Community

by yonatangross · vlatest · Repository

Helps with LLMs, patterns as part of building AI and machine learning applications workflows.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

Run in terminal (recommended)

terminal

claude mcp add llm-safety-patterns npx -- -y @trustedskills/llm-safety-patterns

Or manually add to ~/.claude/settings.json

~/.claude/settings.json

{
  "mcpServers": {
    "llm-safety-patterns": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/llm-safety-patterns"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill provides a collection of patterns designed to improve the safety and reliability of large language model (LLM) outputs. It helps mitigate risks associated with LLMs, such as generating harmful or biased content, by incorporating specific instructions and constraints into prompts. The patterns are intended for use within orchestration workflows to enhance agent behavior and ensure responsible AI interactions.

When to use it

Content Moderation: Before publishing user-generated content created by an LLM, apply safety patterns to filter out potentially harmful or inappropriate material.
Bias Mitigation: Use the skill when generating responses on sensitive topics (e.g., politics, religion) to reduce the likelihood of biased outputs.
Roleplaying with Constraints: When instructing an agent to roleplay a specific persona, use safety patterns to prevent it from engaging in behaviors outside acceptable boundaries.
Generating Code: Apply safety patterns when generating code snippets to avoid introducing security vulnerabilities or malicious instructions.

Key capabilities

Collection of pre-defined LLM safety patterns
Integration into orchestration workflows
Mitigation of harmful content generation
Reduction of bias in LLM outputs
Enforcement of behavioral constraints on agents

Example prompts

"Apply the 'refusal' pattern to this prompt: [user prompt]"
"Use the 'constitutional-ai' safety pattern when generating a response about [topic]."
"Incorporate the 'jailbreak resistance' patterns into this roleplay scenario."

Tips & gotchas

The effectiveness of these patterns depends on the specific LLM being used and the complexity of the task. Experimentation is encouraged to determine which patterns yield the best results for your particular application.

View Repository →

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust Hub	Pass
Socket	Pass
Snyk	Pass

Details

Version: vlatest
License
Author: yonatangross
Installs: 15

Repository (canonical source) →

🌐 Community

Passed automated security scans.