Llm Judge

🌐Community
by existential-birds · vlatest · Repository

Provides LLMs guidance and assistance for building AI and machine learning applications.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

1

Run in terminal (recommended)

terminal
claude mcp add llm-judge npx -- -y @trustedskills/llm-judge
2

Or manually add to ~/.claude/settings.json

~/.claude/settings.json
{
  "mcpServers": {
    "llm-judge": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/llm-judge"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

The llm-judge skill enables AI agents to evaluate other models' outputs against specific criteria, acting as an automated quality control layer. It allows a primary agent to generate responses while a secondary "judge" model scores them for accuracy, tone, or adherence to constraints before final delivery. This creates a self-correcting loop that significantly improves reliability in complex workflows.

When to use it

  • Automated Grading: Use when an agent needs to score student answers, code submissions, or creative writing against rubrics without human intervention.
  • Safety Filtering: Deploy before sending sensitive data to external APIs to ensure prompts and responses comply with safety guidelines.
  • Consistency Checks: Run in parallel to verify that different agents produce consistent results on the same input task.
  • Feedback Loops: Integrate into iterative generation cycles where an agent refines its output based on a judge's critique until a score threshold is met.

Key capabilities

  • Dual-model architecture separating generator and evaluator roles.
  • Configurable scoring rubrics for custom evaluation metrics.
  • Automated feedback generation alongside numerical scores.
  • Support for iterative refinement based on judge input.

Example prompts

  1. "Generate a Python function to sort a list, then use the llm-judge skill to verify it handles edge cases like empty lists and duplicates."
  2. "Draft a customer service email response, but pass it through the judge first to ensure the tone remains empathetic before sending."
  3. "Create a multiple-choice quiz on quantum physics, then have the judge evaluate the questions for factual accuracy against a provided textbook summary."

Tips & gotchas

Ensure the judge model has access to the same context or reference materials as the generator to avoid biased evaluations due to information gaps. For high-stakes decisions, configure the system to require human review if the judge's confidence score falls below a specific threshold rather than auto-rejecting low-scoring outputs.

Tags

🛡️

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust HubPass
SocketPass
SnykPass

Details

Version
vlatest
License
Author
existential-birds
Installs
49

🌐 Community

Passed automated security scans.