Aws Bedrock Evals

Name: Aws Bedrock Evals
Author: antstackio

🌐Community

by antstackio · vlatest · Repository

Provides AWS guidance and assistance for deploying and managing cloud infrastructure.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

Run in terminal (recommended)

terminal

claude mcp add aws-bedrock-evals npx -- -y @trustedskills/aws-bedrock-evals

Or manually add to ~/.claude/settings.json

~/.claude/settings.json

{
  "mcpServers": {
    "aws-bedrock-evals": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/aws-bedrock-evals"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill allows AI agents to evaluate models deployed on AWS Bedrock. It provides a mechanism for assessing model performance based on predefined metrics and datasets, enabling users to track quality over time and compare different models. The skill facilitates automated evaluation workflows within an agent's operational pipeline.

When to use it

Model Selection: Compare the performance of multiple foundation models available through Bedrock before committing to a specific one for your application.
Continuous Monitoring: Regularly evaluate deployed models to detect degradation in performance due to data drift or other factors.
A/B Testing: Evaluate different model versions or configurations against each other to optimize for specific metrics.
Automated Reporting: Generate reports on model evaluation results, providing insights into model health and potential areas for improvement.

Key capabilities

Evaluates models deployed on AWS Bedrock.
Uses predefined metrics for assessment.
Leverages datasets for evaluation.
Supports automated evaluation workflows.

Example prompts

"Evaluate the 'anthropic.claude-v2' model using the provided dataset and report accuracy."
"Run a performance comparison between 'cohere.command-text-v14' and 'meta.llama2-13b-chat' on the sentiment analysis benchmark."
“Generate a weekly report of evaluation metrics for all deployed models.”

Tips & gotchas

Requires appropriate AWS credentials configured within the agent environment to access Bedrock resources.
The accuracy of evaluations depends heavily on the quality and relevance of the chosen datasets.

View Repository →

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust Hub	Pass
Socket	Pass
Snyk	Pass

Details

Version: vlatest
License
Author: antstackio
Installs: 6

Repository (canonical source) →

🌐 Community

Passed automated security scans.