Model Evaluation Metrics

Name: Model Evaluation Metrics
Author: jeremylongshore

🌐Community

by jeremylongshore · vlatest · Repository

Calculates common machine learning evaluation metrics (accuracy, precision, recall, F1-score) for model performance assessment.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

Run in terminal (recommended)

terminal

claude mcp add model-evaluation-metrics npx -- -y @trustedskills/model-evaluation-metrics

Or manually add to ~/.claude/settings.json

~/.claude/settings.json

{
  "mcpServers": {
    "model-evaluation-metrics": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/model-evaluation-metrics"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill allows AI agents to calculate and interpret common model evaluation metrics. It can compute values like accuracy, precision, recall, F1-score, and area under the ROC curve (AUC). The results are presented in a structured format suitable for analysis and comparison of different models or configurations.

When to use it

Comparing Model Performance: Evaluate multiple machine learning models trained on the same dataset to determine which performs best.
Fine-tuning Models: Assess how changes to model parameters impact key performance indicators.
Debugging Model Issues: Identify potential biases or weaknesses in a model by examining metrics across different classes or demographics.
Reporting Results: Generate standardized evaluation reports for stakeholders, including clear metric values and interpretations.

Key capabilities

Calculates accuracy
Calculates precision
Calculates recall
Calculates F1-score
Calculates area under the ROC curve (AUC)

Example prompts

"Calculate the accuracy, precision, recall, and F1-score for this dataset: [dataset]"
"What is the AUC of this model given these true positives, false positives, true negatives, and false negatives?"
"Compare the performance metrics (accuracy, precision, recall) of Model A and Model B based on these results."

Tips & gotchas

The skill requires a structured dataset containing ground truth labels and predicted values. Ensure your data is properly formatted for accurate metric calculation; incorrect formatting can lead to misleading results.

View Repository →

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust Hub	Pass
Socket	Pass
Snyk	Pass

Details

Version: vlatest
License
Author: jeremylongshore
Installs: 13

Repository (canonical source) →

🌐 Community

Passed automated security scans.