Nemo Evaluator
Nemo Evaluator assesses the quality of generated text based on a defined prompt, ensuring outputs align with desired criteria and improving consistency.
Install on your platform
We auto-selected Claude Code based on this skill’s supported platforms.
Run in terminal (recommended)
claude mcp add nemo-evaluator npx -- -y @trustedskills/nemo-evaluator
Or manually add to ~/.claude/settings.json
{
"mcpServers": {
"nemo-evaluator": {
"command": "npx",
"args": [
"-y",
"@trustedskills/nemo-evaluator"
]
}
}
}Requires Claude Code (claude CLI). Run claude --version to verify your install.
About This Skill
What it does
The nemo-evaluator skill provides a way to evaluate text generation models using NVIDIA's NeMo framework. It allows you to run inference on specified models and assess their performance based on predefined metrics. This facilitates the comparison of different model outputs and helps in identifying areas for improvement during development or deployment.
When to use it
- Model Comparison: Evaluate multiple text generation models (e.g., summarization, translation) against each other to determine which performs best for a given task.
- Performance Monitoring: Track the performance of a deployed model over time and identify potential degradation in quality.
- A/B Testing: Compare different versions of a model or prompting strategies to optimize output quality.
- Automated Evaluation Pipelines: Integrate into automated workflows for continuous model evaluation and improvement.
Key capabilities
- Model inference using NVIDIA NeMo
- Performance metric calculation
- Comparison of model outputs
- Integration with automated pipelines
Example prompts
- "Evaluate the summarization performance of Model A versus Model B on this dataset."
- "Run inference with the 'translation' model and calculate BLEU score."
- "Compare the output quality of these two models using ROUGE metrics."
Tips & gotchas
- Requires NVIDIA NeMo to be installed and configured.
- Ensure that the specified models are compatible with the
nemo-evaluatorskill's supported architectures.
Tags
TrustedSkills Verification
Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.
Security Audits
| Gen Agent Trust Hub | Pass |
| Socket | Pass |
| Snyk | Pass |
🌐 Community
Passed automated security scans.