Write Judge Prompt
Critically evaluates and refines prompts for large language models to maximize output quality and relevance.
Install on your platform
We auto-selected Claude Code based on this skill’s supported platforms.
Run in terminal (recommended)
claude mcp add write-judge-prompt npx -- -y @trustedskills/write-judge-prompt
Or manually add to ~/.claude/settings.json
{
"mcpServers": {
"write-judge-prompt": {
"command": "npx",
"args": [
"-y",
"@trustedskills/write-judge-prompt"
]
}
}
}Requires Claude Code (claude CLI). Run claude --version to verify your install.
About This Skill
The write-judge-prompt skill generates structured evaluation instructions to assess AI agent outputs against specific criteria. It transforms raw task requirements into precise rubrics that guide an evaluator model in scoring responses accurately and consistently.
When to use it
- Automated grading: Create objective scoring rules for automated tests or benchmark suites.
- Quality assurance: Define checklists to verify if an agent's output meets safety, format, or accuracy standards.
- Model alignment: Craft specific constraints to ensure generated text adheres to complex domain rules (e.g., legal formatting).
- Iterative improvement: Generate feedback prompts to analyze why a previous agent response succeeded or failed.
Key capabilities
- Converts natural language task descriptions into formal evaluation rubrics.
- Supports multi-criteria scoring with weighted importance for different aspects of the output.
- Generates clear pass/fail conditions for binary decision-making in evaluation pipelines.
Example prompts
- "Create a judge prompt to evaluate if a Python script correctly parses CSV files while handling missing values."
- "Write an evaluation rubric for an AI agent that summarizes medical articles, focusing on accuracy and tone neutrality."
- "Generate a scoring guide to assess whether a coding assistant's explanation is clear for a beginner audience."
Tips & gotchas
Ensure your input task description includes explicit success criteria; vague goals will result in weak judge prompts. This skill works best when paired with a separate generation prompt, creating a closed-loop evaluation system where the agent creates content and the judge validates it.
Tags
TrustedSkills Verification
Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.
Security Audits
| Gen Agent Trust Hub | Pass |
| Socket | Pass |
| Snyk | Pass |
🌐 Community
Passed automated security scans.