Model Extraction Relu Logits
Extracts ReLU activations and logits from a model's layers to analyze internal decision-making processes.
Install on your platform
We auto-selected Claude Code based on this skill’s supported platforms.
Run in terminal (recommended)
claude mcp add model-extraction-relu-logits npx -- -y @trustedskills/model-extraction-relu-logits
Or manually add to ~/.claude/settings.json
{
"mcpServers": {
"model-extraction-relu-logits": {
"command": "npx",
"args": [
"-y",
"@trustedskills/model-extraction-relu-logits"
]
}
}
}Requires Claude Code (claude CLI). Run claude --version to verify your install.
About This Skill
What it does
This skill allows an AI agent to extract information about a language model's internal workings, specifically focusing on ReLU activations and logits. It can be used to analyze how a model makes decisions by inspecting these intermediate values. The skill provides insights into the model’s behavior without requiring access to its weights or architecture directly.
When to use it
- Debugging Model Behavior: Understand why a model is producing unexpected outputs by examining ReLU activations and logits for specific inputs.
- Analyzing Decision Boundaries: Investigate how different input features influence the model's final prediction through logit analysis.
- Educational Purposes: Demonstrate the internal processes of language models to students or colleagues.
- Security Auditing (Limited): Identify potential vulnerabilities by observing patterns in ReLU activations and logits, although this is a limited form of auditing.
Key capabilities
- ReLU Activation Extraction: Retrieves values from ReLU activation functions within the model.
- Logit Extraction: Provides access to the raw logit scores before softmax or other post-processing steps.
- Input Specificity: Allows targeting specific inputs for analysis.
Example prompts
- "Extract the ReLU activations and logits for the input 'The cat sat on the' "
- "Show me the logits produced by the model when classifying this sentence as positive or negative."
- "Can you analyze the ReLU activations for the word 'king' in the context of the phrase 'the king is'."
Tips & gotchas
- This skill requires a language model to be running and accessible. The agent will need appropriate permissions to query the model’s internal states.
Tags
TrustedSkills Verification
Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.
Security Audits
| Gen Agent Trust Hub | Pass |
| Socket | Pass |
| Snyk | Pass |
🌐 Community
Passed automated security scans.