Llm Evaluation
Provides LLMs guidance and assistance for building AI and machine learning applications.
Install on your platform
We auto-selected Claude Code based on this skill’s supported platforms.
Run in terminal (recommended)
claude mcp add llm-evaluation npx -- -y @trustedskills/llm-evaluation
Or manually add to ~/.claude/settings.json
{
"mcpServers": {
"llm-evaluation": {
"command": "npx",
"args": [
"-y",
"@trustedskills/llm-evaluation"
]
}
}
}Requires Claude Code (claude CLI). Run claude --version to verify your install.
About This Skill
What it does
The llm-evaluation skill enables users to assess the performance of large language models (LLMs) by defining evaluation criteria, scoring responses, and providing detailed feedback. It supports both automated and manual evaluation methods, making it useful for refining model outputs and ensuring alignment with desired outcomes.
When to use it
- You need to evaluate the accuracy or quality of an LLM's response to a specific query.
- You want to compare multiple models based on predefined metrics such as relevance, coherence, or factual correctness.
- You are iterating on prompts and need structured feedback to improve model performance.
Key capabilities
- Automated scoring based on user-defined criteria
- Manual evaluation with customizable rubrics
- Comparison of multiple model responses side by side
- Detailed feedback generation for each response
Example prompts
- "Evaluate this LLM's response against the following criteria: accuracy, clarity, and relevance."
- "Compare the outputs from Model A and Model B using a rubric focused on factual correctness."
- "Provide detailed feedback on how well this model answered the question about climate change."
Tips & gotchas
- Define clear evaluation criteria in advance to ensure consistent results.
- Manual evaluations may be time-consuming for large datasets, so consider automating where possible.
Tags
TrustedSkills Verification
Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.
Security Audits
| Gen Agent Trust Hub | Pass |
| Socket | Pass |
| Snyk | Pass |
🌐 Community
Passed automated security scans.