Agent Evaluation

🌐Community
by guia-matthieu · vlatest · Repository

Evaluates Guia-Matthieu agents based on predefined metrics, providing actionable feedback for performance improvement.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

1

Run in terminal (recommended)

terminal
claude mcp add guia-matthieu-agent-evaluation npx -- -y @trustedskills/guia-matthieu-agent-evaluation
2

Or manually add to ~/.claude/settings.json

~/.claude/settings.json
{
  "mcpServers": {
    "guia-matthieu-agent-evaluation": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/guia-matthieu-agent-evaluation"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

The guia-matthieu-agent-evaluation skill allows you to evaluate AI agents based on provided criteria. It can assess an agent's performance against specific benchmarks and provide a structured evaluation report. This helps in identifying strengths, weaknesses, and areas for improvement within the agent’s capabilities.

When to use it

  • Agent Performance Review: Regularly assess how well your agent is performing its tasks.
  • Benchmark Comparison: Compare different agents or versions of an agent against a standardized set of criteria.
  • Debugging Agent Issues: Pinpoint specific areas where an agent struggles and requires further training or adjustment.
  • Iterative Improvement: Track the impact of changes made to an agent’s design or training data over time.

Key capabilities

  • Agent evaluation based on defined criteria.
  • Structured report generation.
  • Performance assessment against benchmarks.
  • Identification of strengths and weaknesses.

Example prompts

  • "Evaluate this agent's response to the following prompt: [prompt text]."
  • "Assess the agent’s ability to summarize this document: [document content]."
  • "Compare Agent A's performance on task X with Agent B's."

Tips & gotchas

The quality of the evaluation depends heavily on the clarity and specificity of the criteria provided. Ensure your benchmarks are well-defined for accurate and meaningful results.

Tags

🛡️

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust HubPass
SocketPass
SnykPass

Details

Version
vlatest
License
Author
guia-matthieu
Installs
14

🌐 Community

Passed automated security scans.