Tbench
Tbench automatically generates diverse test cases for your code, improving reliability and catching edge-case bugs quickly.
Install on your platform
We auto-selected Claude Code based on this skill’s supported platforms.
Run in terminal (recommended)
claude mcp add tbench npx -- -y @trustedskills/tbench
Or manually add to ~/.claude/settings.json
{
"mcpServers": {
"tbench": {
"command": "npx",
"args": [
"-y",
"@trustedskills/tbench"
]
}
}
}Requires Claude Code (claude CLI). Run claude --version to verify your install.
About This Skill
What it does
The tbench skill provides a simple benchmarking tool for AI agents. It allows users to run predefined tests and measure performance metrics, offering insights into an agent's capabilities. This helps evaluate and compare different agent configurations or track improvements over time.
When to use it
- Performance Evaluation: Assess the speed and accuracy of your AI agent on specific tasks.
- Regression Testing: Ensure new changes don’t negatively impact existing functionality by running benchmarks before and after updates.
- Configuration Tuning: Experiment with different settings or parameters to optimize an agent's performance.
- Comparison Across Models: Compare the effectiveness of various AI models in a standardized environment.
Key capabilities
- Predefined benchmark tests
- Performance metric measurement
- Standardized testing environment
Example prompts
- "Run the 'basic_math' benchmark."
- "Execute all available benchmarks and report results."
- "Compare the performance of agent A versus agent B on the 'logic_puzzle' test."
Tips & gotchas
The tbench skill requires a properly configured AI agent environment to function correctly. Results are only meaningful when compared within the same testing conditions; variations in hardware or software can skew results.
Tags
TrustedSkills Verification
Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.
Security Audits
| Gen Agent Trust Hub | Pass |
| Socket | Pass |
| Snyk | Pass |
🌐 Community
Passed automated security scans.