Addon Llm Judge Evals

🌐Community
by ajrlewis · vlatest · Repository

Provides LLMs guidance and assistance for building AI and machine learning applications.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

1

Run in terminal (recommended)

terminal
claude mcp add addon-llm-judge-evals npx -- -y @trustedskills/addon-llm-judge-evals
2

Or manually add to ~/.claude/settings.json

~/.claude/settings.json
{
  "mcpServers": {
    "addon-llm-judge-evals": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/addon-llm-judge-evals"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill provides evaluation capabilities for Large Language Models (LLMs). It allows you to assess LLM outputs against predefined criteria, providing structured feedback and scores. The tool is designed to automate aspects of model validation and ensure consistent quality across different generations.

When to use it

  • Evaluating code generation: Assess the correctness and efficiency of code produced by an LLM.
  • Content quality assessment: Score generated text based on factors like relevance, coherence, and factual accuracy.
  • Comparing model outputs: Compare responses from different LLMs for a given prompt to identify strengths and weaknesses.
  • Automated testing pipelines: Integrate evaluations into automated workflows for continuous model improvement.

Key capabilities

  • LLM output evaluation
  • Scoring against criteria
  • Structured feedback generation
  • Model comparison

Example prompts

  • "Evaluate the following code snippet: [code]"
  • "Score this text based on relevance and coherence: [text]"
  • "Compare these two responses to the prompt 'Write a poem about cats': [response 1] [response 2]"

Tips & gotchas

The quality of evaluations depends heavily on the clarity and specificity of the evaluation criteria. Ensure your prompts clearly define what constitutes good or bad output for optimal results.

Tags

🛡️

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust HubPass
SocketPass
SnykPass

Details

Version
vlatest
License
Author
ajrlewis
Installs
6

🌐 Community

Passed automated security scans.