Llm Evaluation

Name: Llm Evaluation
Author: wshobson

🌐Community

by wshobson · vlatest · Repository

Provides LLMs guidance and assistance for building AI and machine learning applications.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

Run in terminal (recommended)

terminal

claude mcp add llm-evaluation npx -- -y @trustedskills/llm-evaluation

Or manually add to ~/.claude/settings.json

~/.claude/settings.json

{
  "mcpServers": {
    "llm-evaluation": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/llm-evaluation"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

The llm-evaluation skill enables users to assess the performance of large language models (LLMs) by defining evaluation criteria, scoring responses, and providing detailed feedback. It supports both automated and manual evaluation methods, making it useful for refining model outputs and ensuring alignment with desired outcomes.

When to use it

You need to evaluate the accuracy or quality of an LLM's response to a specific query.
You want to compare multiple models based on predefined metrics such as relevance, coherence, or factual correctness.
You are iterating on prompts and need structured feedback to improve model performance.

Key capabilities

Automated scoring based on user-defined criteria
Manual evaluation with customizable rubrics
Comparison of multiple model responses side by side
Detailed feedback generation for each response

Example prompts

"Evaluate this LLM's response against the following criteria: accuracy, clarity, and relevance."
"Compare the outputs from Model A and Model B using a rubric focused on factual correctness."
"Provide detailed feedback on how well this model answered the question about climate change."

Tips & gotchas

Define clear evaluation criteria in advance to ensure consistent results.
Manual evaluations may be time-consuming for large datasets, so consider automating where possible.

View Repository →

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust Hub	Pass
Socket	Pass
Snyk	Pass

Details

Version: vlatest
License
Author: wshobson
Installs: 2.8k

Repository (canonical source) →

🌐 Community

Passed automated security scans.