Eval Accuracy
Evaluates the accuracy of generated content by comparing it to a reference source, ensuring reliable and trustworthy outputs.
Install on your platform
We auto-selected Claude Code based on this skill’s supported platforms.
Run in terminal (recommended)
claude mcp add eval-accuracy npx -- -y @trustedskills/eval-accuracy
Or manually add to ~/.claude/settings.json
{
"mcpServers": {
"eval-accuracy": {
"command": "npx",
"args": [
"-y",
"@trustedskills/eval-accuracy"
]
}
}
}Requires Claude Code (claude CLI). Run claude --version to verify your install.
About This Skill
What it does
The eval-accuracy skill assesses the accuracy of AI assistant responses against a provided ground truth. It calculates and reports metrics like precision, recall, and F1-score to quantify performance. This allows for objective evaluation and comparison of different AI models or prompt strategies.
When to use it
- Evaluating chatbot performance: Measure how accurately a chatbot answers questions compared to expected responses.
- Comparing model outputs: Determine which language model produces more accurate results for a given task.
- Prompt engineering optimization: Assess the impact of different prompts on the accuracy of AI-generated content.
- Benchmarking AI systems: Establish baseline performance metrics for ongoing monitoring and improvement.
Key capabilities
- Accuracy assessment
- Precision calculation
- Recall calculation
- F1-score calculation
Example prompts
- "Evaluate the following assistant response: '[Assistant Response]' against this ground truth: '[Ground Truth]'."
- "Calculate the F1 score for these two texts: '[Assistant Response]' and '[Ground Truth]'."
- "Assess the accuracy of this AI output: '[Assistant Output]' compared to the expected answer: '[Expected Answer]'."
Tips & gotchas
The quality of the ground truth data is crucial for accurate evaluation. Ensure your ground truths are comprehensive, correct, and representative of the desired response style.
Tags
TrustedSkills Verification
Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.
Security Audits
| Gen Agent Trust Hub | Pass |
| Socket | Pass |
| Snyk | Pass |
🌐 Community
Passed automated security scans.