Agent Evaluation
Evaluates Guia-Matthieu agents based on predefined metrics, providing actionable feedback for performance improvement.
Install on your platform
We auto-selected Claude Code based on this skill’s supported platforms.
Run in terminal (recommended)
claude mcp add guia-matthieu-agent-evaluation npx -- -y @trustedskills/guia-matthieu-agent-evaluation
Or manually add to ~/.claude/settings.json
{
"mcpServers": {
"guia-matthieu-agent-evaluation": {
"command": "npx",
"args": [
"-y",
"@trustedskills/guia-matthieu-agent-evaluation"
]
}
}
}Requires Claude Code (claude CLI). Run claude --version to verify your install.
About This Skill
What it does
The guia-matthieu-agent-evaluation skill allows you to evaluate AI agents based on provided criteria. It can assess an agent's performance against specific benchmarks and provide a structured evaluation report. This helps in identifying strengths, weaknesses, and areas for improvement within the agent’s capabilities.
When to use it
- Agent Performance Review: Regularly assess how well your agent is performing its tasks.
- Benchmark Comparison: Compare different agents or versions of an agent against a standardized set of criteria.
- Debugging Agent Issues: Pinpoint specific areas where an agent struggles and requires further training or adjustment.
- Iterative Improvement: Track the impact of changes made to an agent’s design or training data over time.
Key capabilities
- Agent evaluation based on defined criteria.
- Structured report generation.
- Performance assessment against benchmarks.
- Identification of strengths and weaknesses.
Example prompts
- "Evaluate this agent's response to the following prompt: [prompt text]."
- "Assess the agent’s ability to summarize this document: [document content]."
- "Compare Agent A's performance on task X with Agent B's."
Tips & gotchas
The quality of the evaluation depends heavily on the clarity and specificity of the criteria provided. Ensure your benchmarks are well-defined for accurate and meaningful results.
Tags
TrustedSkills Verification
Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.
Security Audits
| Gen Agent Trust Hub | Pass |
| Socket | Pass |
| Snyk | Pass |
🌐 Community
Passed automated security scans.