Evaluate Rag

🌐Community
by hamelsmu · vlatest · Repository

Evaluates Retrieval-Augmented Generation (RAG) responses for accuracy and relevance, boosting RAG system quality & reliability.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

1

Run in terminal (recommended)

terminal
claude mcp add evaluate-rag npx -- -y @trustedskills/evaluate-rag
2

Or manually add to ~/.claude/settings.json

~/.claude/settings.json
{
  "mcpServers": {
    "evaluate-rag": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/evaluate-rag"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

The evaluate-rag skill provides a framework to assess Retrieval-Augmented Generation (RAG) pipelines by comparing AI agent outputs against ground truth data. It automatically calculates metrics like accuracy and relevance to determine how well an agent retrieves and utilizes information from its knowledge base.

When to use it

  • Validating the performance of a RAG system before deploying it to production environments.
  • Debugging retrieval issues where the AI answers incorrectly despite having access to relevant documents.
  • Benchmarking different embedding models or chunking strategies to see which yields higher accuracy scores.
  • Continuously monitoring agent health over time to detect drift in data quality or retrieval logic.

Key capabilities

  • Executes automated evaluation tests on RAG pipelines using predefined ground truth datasets.
  • Generates quantitative metrics to score the precision and recall of retrieved context.
  • Facilitates iterative improvement by highlighting specific failures in information retrieval or synthesis.

Example prompts

  • "Run an evaluation test on my customer support bot's RAG pipeline using the provided Q&A dataset."
  • "Compare the accuracy of two different embedding models for my legal document search agent."
  • "Generate a report on why my AI agent is failing to retrieve recent policy updates from the knowledge base."

Tips & gotchas

Ensure you have a reliable ground truth dataset ready, as the skill relies on comparing outputs against known correct answers. This tool is specifically designed for RAG architectures; it will not evaluate standard LLMs without an external retrieval component.

Tags

🛡️

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust HubPass
SocketPass
SnykPass

Details

Version
vlatest
License
Author
hamelsmu
Installs
48

🌐 Community

Passed automated security scans.