Aws Bedrock Evals

🌐Community
by antstackio · vlatest · Repository

Provides AWS guidance and assistance for deploying and managing cloud infrastructure.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

1

Run in terminal (recommended)

terminal
claude mcp add aws-bedrock-evals npx -- -y @trustedskills/aws-bedrock-evals
2

Or manually add to ~/.claude/settings.json

~/.claude/settings.json
{
  "mcpServers": {
    "aws-bedrock-evals": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/aws-bedrock-evals"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill allows AI agents to evaluate models deployed on AWS Bedrock. It provides a mechanism for assessing model performance based on predefined metrics and datasets, enabling users to track quality over time and compare different models. The skill facilitates automated evaluation workflows within an agent's operational pipeline.

When to use it

  • Model Selection: Compare the performance of multiple foundation models available through Bedrock before committing to a specific one for your application.
  • Continuous Monitoring: Regularly evaluate deployed models to detect degradation in performance due to data drift or other factors.
  • A/B Testing: Evaluate different model versions or configurations against each other to optimize for specific metrics.
  • Automated Reporting: Generate reports on model evaluation results, providing insights into model health and potential areas for improvement.

Key capabilities

  • Evaluates models deployed on AWS Bedrock.
  • Uses predefined metrics for assessment.
  • Leverages datasets for evaluation.
  • Supports automated evaluation workflows.

Example prompts

  • "Evaluate the 'anthropic.claude-v2' model using the provided dataset and report accuracy."
  • "Run a performance comparison between 'cohere.command-text-v14' and 'meta.llama2-13b-chat' on the sentiment analysis benchmark."
  • “Generate a weekly report of evaluation metrics for all deployed models.”

Tips & gotchas

  • Requires appropriate AWS credentials configured within the agent environment to access Bedrock resources.
  • The accuracy of evaluations depends heavily on the quality and relevance of the chosen datasets.

Tags

🛡️

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust HubPass
SocketPass
SnykPass

Details

Version
vlatest
License
Author
antstackio
Installs
6

🌐 Community

Passed automated security scans.