Llm Judge

Name: Llm Judge
Author: existential-birds

🌐Community

by existential-birds · vlatest · Repository

Provides LLMs guidance and assistance for building AI and machine learning applications.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

Run in terminal (recommended)

terminal

claude mcp add llm-judge npx -- -y @trustedskills/llm-judge

Or manually add to ~/.claude/settings.json

~/.claude/settings.json

{
  "mcpServers": {
    "llm-judge": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/llm-judge"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

The llm-judge skill enables AI agents to evaluate other models' outputs against specific criteria, acting as an automated quality control layer. It allows a primary agent to generate responses while a secondary "judge" model scores them for accuracy, tone, or adherence to constraints before final delivery. This creates a self-correcting loop that significantly improves reliability in complex workflows.

When to use it

Automated Grading: Use when an agent needs to score student answers, code submissions, or creative writing against rubrics without human intervention.
Safety Filtering: Deploy before sending sensitive data to external APIs to ensure prompts and responses comply with safety guidelines.
Consistency Checks: Run in parallel to verify that different agents produce consistent results on the same input task.
Feedback Loops: Integrate into iterative generation cycles where an agent refines its output based on a judge's critique until a score threshold is met.

Key capabilities

Dual-model architecture separating generator and evaluator roles.
Configurable scoring rubrics for custom evaluation metrics.
Automated feedback generation alongside numerical scores.
Support for iterative refinement based on judge input.

Example prompts

"Generate a Python function to sort a list, then use the llm-judge skill to verify it handles edge cases like empty lists and duplicates."
"Draft a customer service email response, but pass it through the judge first to ensure the tone remains empathetic before sending."
"Create a multiple-choice quiz on quantum physics, then have the judge evaluate the questions for factual accuracy against a provided textbook summary."

Tips & gotchas

Ensure the judge model has access to the same context or reference materials as the generator to avoid biased evaluations due to information gaps. For high-stakes decisions, configure the system to require human review if the judge's confidence score falls below a specific threshold rather than auto-rejecting low-scoring outputs.

View Repository →

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust Hub	Pass
Socket	Pass
Snyk	Pass

Details

Version: vlatest
License
Author: existential-birds
Installs: 49

Repository (canonical source) →

🌐 Community

Passed automated security scans.