Addon Llm Judge Evals

Name: Addon Llm Judge Evals
Author: ajrlewis

🌐Community

by ajrlewis · vlatest · Repository

Provides LLMs guidance and assistance for building AI and machine learning applications.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

Run in terminal (recommended)

terminal

claude mcp add addon-llm-judge-evals npx -- -y @trustedskills/addon-llm-judge-evals

Or manually add to ~/.claude/settings.json

~/.claude/settings.json

{
  "mcpServers": {
    "addon-llm-judge-evals": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/addon-llm-judge-evals"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill provides evaluation capabilities for Large Language Models (LLMs). It allows you to assess LLM outputs against predefined criteria, providing structured feedback and scores. The tool is designed to automate aspects of model validation and ensure consistent quality across different generations.

When to use it

Evaluating code generation: Assess the correctness and efficiency of code produced by an LLM.
Content quality assessment: Score generated text based on factors like relevance, coherence, and factual accuracy.
Comparing model outputs: Compare responses from different LLMs for a given prompt to identify strengths and weaknesses.
Automated testing pipelines: Integrate evaluations into automated workflows for continuous model improvement.

Key capabilities

LLM output evaluation
Scoring against criteria
Structured feedback generation
Model comparison

Example prompts

"Evaluate the following code snippet: [code]"
"Score this text based on relevance and coherence: [text]"
"Compare these two responses to the prompt 'Write a poem about cats': [response 1] [response 2]"

Tips & gotchas

The quality of evaluations depends heavily on the clarity and specificity of the evaluation criteria. Ensure your prompts clearly define what constitutes good or bad output for optimal results.

View Repository →

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust Hub	Pass
Socket	Pass
Snyk	Pass

Details

Version: vlatest
License
Author: ajrlewis
Installs: 6

Repository (canonical source) →

🌐 Community

Passed automated security scans.