Model Extraction Relu Logits

🌐Community
by letta-ai · vlatest · Repository

Extracts ReLU activations and logits from a model's layers to analyze internal decision-making processes.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

1

Run in terminal (recommended)

terminal
claude mcp add model-extraction-relu-logits npx -- -y @trustedskills/model-extraction-relu-logits
2

Or manually add to ~/.claude/settings.json

~/.claude/settings.json
{
  "mcpServers": {
    "model-extraction-relu-logits": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/model-extraction-relu-logits"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill allows an AI agent to extract information about a language model's internal workings, specifically focusing on ReLU activations and logits. It can be used to analyze how a model makes decisions by inspecting these intermediate values. The skill provides insights into the model’s behavior without requiring access to its weights or architecture directly.

When to use it

  • Debugging Model Behavior: Understand why a model is producing unexpected outputs by examining ReLU activations and logits for specific inputs.
  • Analyzing Decision Boundaries: Investigate how different input features influence the model's final prediction through logit analysis.
  • Educational Purposes: Demonstrate the internal processes of language models to students or colleagues.
  • Security Auditing (Limited): Identify potential vulnerabilities by observing patterns in ReLU activations and logits, although this is a limited form of auditing.

Key capabilities

  • ReLU Activation Extraction: Retrieves values from ReLU activation functions within the model.
  • Logit Extraction: Provides access to the raw logit scores before softmax or other post-processing steps.
  • Input Specificity: Allows targeting specific inputs for analysis.

Example prompts

  • "Extract the ReLU activations and logits for the input 'The cat sat on the' "
  • "Show me the logits produced by the model when classifying this sentence as positive or negative."
  • "Can you analyze the ReLU activations for the word 'king' in the context of the phrase 'the king is'."

Tips & gotchas

  • This skill requires a language model to be running and accessible. The agent will need appropriate permissions to query the model’s internal states.

Tags

🛡️

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust HubPass
SocketPass
SnykPass

Details

Version
vlatest
License
Author
letta-ai
Installs
27

🌐 Community

Passed automated security scans.