Model Evaluation Metrics
Calculates common machine learning evaluation metrics (accuracy, precision, recall, F1-score) for model performance assessment.
Install on your platform
We auto-selected Claude Code based on this skill’s supported platforms.
Run in terminal (recommended)
claude mcp add model-evaluation-metrics npx -- -y @trustedskills/model-evaluation-metrics
Or manually add to ~/.claude/settings.json
{
"mcpServers": {
"model-evaluation-metrics": {
"command": "npx",
"args": [
"-y",
"@trustedskills/model-evaluation-metrics"
]
}
}
}Requires Claude Code (claude CLI). Run claude --version to verify your install.
About This Skill
What it does
This skill allows AI agents to calculate and interpret common model evaluation metrics. It can compute values like accuracy, precision, recall, F1-score, and area under the ROC curve (AUC). The results are presented in a structured format suitable for analysis and comparison of different models or configurations.
When to use it
- Comparing Model Performance: Evaluate multiple machine learning models trained on the same dataset to determine which performs best.
- Fine-tuning Models: Assess how changes to model parameters impact key performance indicators.
- Debugging Model Issues: Identify potential biases or weaknesses in a model by examining metrics across different classes or demographics.
- Reporting Results: Generate standardized evaluation reports for stakeholders, including clear metric values and interpretations.
Key capabilities
- Calculates accuracy
- Calculates precision
- Calculates recall
- Calculates F1-score
- Calculates area under the ROC curve (AUC)
Example prompts
- "Calculate the accuracy, precision, recall, and F1-score for this dataset: [dataset]"
- "What is the AUC of this model given these true positives, false positives, true negatives, and false negatives?"
- "Compare the performance metrics (accuracy, precision, recall) of Model A and Model B based on these results."
Tips & gotchas
The skill requires a structured dataset containing ground truth labels and predicted values. Ensure your data is properly formatted for accurate metric calculation; incorrect formatting can lead to misleading results.
Tags
TrustedSkills Verification
Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.
Security Audits
| Gen Agent Trust Hub | Pass |
| Socket | Pass |
| Snyk | Pass |
🌐 Community
Passed automated security scans.