Llm Evaluation

Name: Llm Evaluation
Author: sickn33

🌐Community

by sickn33 · vlatest · Repository

Provides LLMs guidance and assistance for building AI and machine learning applications.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

Run in terminal (recommended)

terminal

claude mcp add sickn33-llm-evaluation npx -- -y @trustedskills/sickn33-llm-evaluation

Or manually add to ~/.claude/settings.json

~/.claude/settings.json

{
  "mcpServers": {
    "sickn33-llm-evaluation": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/sickn33-llm-evaluation"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill enables AI agents to perform structured evaluations of Large Language Models (LLMs) using predefined criteria and scoring mechanisms. It automates the assessment process to provide quantitative feedback on model performance across specific tasks or domains.

When to use it

Validating the accuracy and relevance of an LLM's responses during development cycles.
Comparing multiple model versions against a standardized rubric to identify improvements.
Ensuring compliance with safety guidelines by scoring outputs for harmful content.
Generating detailed performance reports for stakeholder review or deployment readiness.

Key capabilities

Executes automated evaluation workflows based on user-defined parameters.
Assigns numerical scores to LLM outputs based on specific quality metrics.
Provides structured feedback highlighting strengths and weaknesses in model responses.

Example prompts

"Evaluate this LLM response for factual accuracy and tone consistency using the standard rubric."
"Run a comparative evaluation of three different model outputs against our safety guidelines."
"Score the following generated text based on creativity, coherence, and adherence to instructions."

Tips & gotchas

Ensure you have clear, well-defined evaluation criteria before running assessments to avoid ambiguous scoring. This skill relies on structured input; vague prompts may result in less actionable evaluation data.

View Repository →

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust Hub	Pass
Socket	Pass
Snyk	Pass

Details

Version: vlatest
License
Author: sickn33
Installs: 76

Repository (canonical source) →

🌐 Community

Passed automated security scans.