Model Extraction Relu Logits

Name: Model Extraction Relu Logits
Author: letta-ai

🌐Community

by letta-ai · vlatest · Repository

Extracts ReLU activations and logits from a model's layers to analyze internal decision-making processes.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

Run in terminal (recommended)

terminal

claude mcp add model-extraction-relu-logits npx -- -y @trustedskills/model-extraction-relu-logits

Or manually add to ~/.claude/settings.json

~/.claude/settings.json

{
  "mcpServers": {
    "model-extraction-relu-logits": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/model-extraction-relu-logits"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill allows an AI agent to extract information about a language model's internal workings, specifically focusing on ReLU activations and logits. It can be used to analyze how a model makes decisions by inspecting these intermediate values. The skill provides insights into the model’s behavior without requiring access to its weights or architecture directly.

When to use it

Debugging Model Behavior: Understand why a model is producing unexpected outputs by examining ReLU activations and logits for specific inputs.
Analyzing Decision Boundaries: Investigate how different input features influence the model's final prediction through logit analysis.
Educational Purposes: Demonstrate the internal processes of language models to students or colleagues.
Security Auditing (Limited): Identify potential vulnerabilities by observing patterns in ReLU activations and logits, although this is a limited form of auditing.

Key capabilities

ReLU Activation Extraction: Retrieves values from ReLU activation functions within the model.
Logit Extraction: Provides access to the raw logit scores before softmax or other post-processing steps.
Input Specificity: Allows targeting specific inputs for analysis.

Example prompts

"Extract the ReLU activations and logits for the input 'The cat sat on the' "
"Show me the logits produced by the model when classifying this sentence as positive or negative."
"Can you analyze the ReLU activations for the word 'king' in the context of the phrase 'the king is'."

Tips & gotchas

This skill requires a language model to be running and accessible. The agent will need appropriate permissions to query the model’s internal states.

View Repository →

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust Hub	Pass
Socket	Pass
Snyk	Pass

Details

Version: vlatest
License
Author: letta-ai
Installs: 27

Repository (canonical source) →

🌐 Community

Passed automated security scans.