Model Pruning

Name: Model Pruning
Author: davila7

🌐Community

by davila7 · vlatest · Repository

Reduces large language model size and computational cost while preserving accuracy through strategic parameter removal.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

Run in terminal (recommended)

terminal

claude mcp add model-pruning npx -- -y @trustedskills/model-pruning

Or manually add to ~/.claude/settings.json

~/.claude/settings.json

{
  "mcpServers": {
    "model-pruning": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/model-pruning"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill enables AI agents to reduce the size of large language models (LLMs) and accelerate inference speed without retraining. It achieves this through a technique called model pruning, which strategically removes less important parameters from the LLM. The goal is to compress models by 40-60% while maintaining accuracy and enabling deployment on resource-constrained devices. Key techniques supported include Wanda, SparseGPT, and structured pruning.

When to use it

To reduce the overall size of an LLM for easier storage and distribution.
When faster inference speeds are needed, potentially achieving 2-4x speedup.
For deploying LLMs on devices with limited resources like mobile phones or edge computing platforms.
To enable efficient serving of LLMs by reducing their memory footprint.
When a one-shot compression method is desired, avoiding the need for retraining the model.

Key capabilities

Model Size Reduction: Compresses models by 40-60% with minimal accuracy loss (<1%).
Inference Acceleration: Potentially speeds up inference by 2-4x through hardware-friendly sparsity.
One-Shot Pruning: Performs compression without requiring model retraining.
Supports Multiple Techniques: Implements Wanda, SparseGPT, and structured pruning methods.
Hardware Deployment: Enables deployment on constrained hardware (mobile, edge devices).

Example prompts

"Prune the Llama 2-7b model using Wanda with a sparsity of 50%."
"Reduce the size of this language model for mobile deployment."
"Apply SparseGPT pruning to improve inference speed."

Tips & gotchas

Requires installation of dependencies including torch, transformers, and accelerate.
The Wanda implementation requires cloning and installing a separate GitHub repository.
A small calibration dataset is needed for the Wanda pruning function to collect activation statistics.

View Repository →

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust Hub	Pass
Socket	Pass
Snyk	Pass

Details

Version: vlatest
License
Author: davila7
Installs: 169

Repository (canonical source) →

🌐 Community

Passed automated security scans.