Aws Bedrock Evals
Provides AWS guidance and assistance for deploying and managing cloud infrastructure.
Install on your platform
We auto-selected Claude Code based on this skill’s supported platforms.
Run in terminal (recommended)
claude mcp add aws-bedrock-evals npx -- -y @trustedskills/aws-bedrock-evals
Or manually add to ~/.claude/settings.json
{
"mcpServers": {
"aws-bedrock-evals": {
"command": "npx",
"args": [
"-y",
"@trustedskills/aws-bedrock-evals"
]
}
}
}Requires Claude Code (claude CLI). Run claude --version to verify your install.
About This Skill
What it does
This skill allows AI agents to evaluate models deployed on AWS Bedrock. It provides a mechanism for assessing model performance based on predefined metrics and datasets, enabling users to track quality over time and compare different models. The skill facilitates automated evaluation workflows within an agent's operational pipeline.
When to use it
- Model Selection: Compare the performance of multiple foundation models available through Bedrock before committing to a specific one for your application.
- Continuous Monitoring: Regularly evaluate deployed models to detect degradation in performance due to data drift or other factors.
- A/B Testing: Evaluate different model versions or configurations against each other to optimize for specific metrics.
- Automated Reporting: Generate reports on model evaluation results, providing insights into model health and potential areas for improvement.
Key capabilities
- Evaluates models deployed on AWS Bedrock.
- Uses predefined metrics for assessment.
- Leverages datasets for evaluation.
- Supports automated evaluation workflows.
Example prompts
- "Evaluate the 'anthropic.claude-v2' model using the provided dataset and report accuracy."
- "Run a performance comparison between 'cohere.command-text-v14' and 'meta.llama2-13b-chat' on the sentiment analysis benchmark."
- “Generate a weekly report of evaluation metrics for all deployed models.”
Tips & gotchas
- Requires appropriate AWS credentials configured within the agent environment to access Bedrock resources.
- The accuracy of evaluations depends heavily on the quality and relevance of the chosen datasets.
Tags
TrustedSkills Verification
Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.
Security Audits
| Gen Agent Trust Hub | Pass |
| Socket | Pass |
| Snyk | Pass |
🌐 Community
Passed automated security scans.