Addon Llm Judge Evals
Provides LLMs guidance and assistance for building AI and machine learning applications.
Install on your platform
We auto-selected Claude Code based on this skill’s supported platforms.
Run in terminal (recommended)
claude mcp add addon-llm-judge-evals npx -- -y @trustedskills/addon-llm-judge-evals
Or manually add to ~/.claude/settings.json
{
"mcpServers": {
"addon-llm-judge-evals": {
"command": "npx",
"args": [
"-y",
"@trustedskills/addon-llm-judge-evals"
]
}
}
}Requires Claude Code (claude CLI). Run claude --version to verify your install.
About This Skill
What it does
This skill provides evaluation capabilities for Large Language Models (LLMs). It allows you to assess LLM outputs against predefined criteria, providing structured feedback and scores. The tool is designed to automate aspects of model validation and ensure consistent quality across different generations.
When to use it
- Evaluating code generation: Assess the correctness and efficiency of code produced by an LLM.
- Content quality assessment: Score generated text based on factors like relevance, coherence, and factual accuracy.
- Comparing model outputs: Compare responses from different LLMs for a given prompt to identify strengths and weaknesses.
- Automated testing pipelines: Integrate evaluations into automated workflows for continuous model improvement.
Key capabilities
- LLM output evaluation
- Scoring against criteria
- Structured feedback generation
- Model comparison
Example prompts
- "Evaluate the following code snippet: [code]"
- "Score this text based on relevance and coherence: [text]"
- "Compare these two responses to the prompt 'Write a poem about cats': [response 1] [response 2]"
Tips & gotchas
The quality of evaluations depends heavily on the clarity and specificity of the evaluation criteria. Ensure your prompts clearly define what constitutes good or bad output for optimal results.
Tags
TrustedSkills Verification
Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.
Security Audits
| Gen Agent Trust Hub | Pass |
| Socket | Pass |
| Snyk | Pass |
🌐 Community
Passed automated security scans.