Model Pruning

🌐Community
by davila7 · vlatest · Repository

Reduces large language model size and computational cost while preserving accuracy through strategic parameter removal.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

1

Run in terminal (recommended)

terminal
claude mcp add model-pruning npx -- -y @trustedskills/model-pruning
2

Or manually add to ~/.claude/settings.json

~/.claude/settings.json
{
  "mcpServers": {
    "model-pruning": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/model-pruning"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill enables AI agents to reduce the size of large language models (LLMs) and accelerate inference speed without retraining. It achieves this through a technique called model pruning, which strategically removes less important parameters from the LLM. The goal is to compress models by 40-60% while maintaining accuracy and enabling deployment on resource-constrained devices. Key techniques supported include Wanda, SparseGPT, and structured pruning.

When to use it

  • To reduce the overall size of an LLM for easier storage and distribution.
  • When faster inference speeds are needed, potentially achieving 2-4x speedup.
  • For deploying LLMs on devices with limited resources like mobile phones or edge computing platforms.
  • To enable efficient serving of LLMs by reducing their memory footprint.
  • When a one-shot compression method is desired, avoiding the need for retraining the model.

Key capabilities

  • Model Size Reduction: Compresses models by 40-60% with minimal accuracy loss (<1%).
  • Inference Acceleration: Potentially speeds up inference by 2-4x through hardware-friendly sparsity.
  • One-Shot Pruning: Performs compression without requiring model retraining.
  • Supports Multiple Techniques: Implements Wanda, SparseGPT, and structured pruning methods.
  • Hardware Deployment: Enables deployment on constrained hardware (mobile, edge devices).

Example prompts

  • "Prune the Llama 2-7b model using Wanda with a sparsity of 50%."
  • "Reduce the size of this language model for mobile deployment."
  • "Apply SparseGPT pruning to improve inference speed."

Tips & gotchas

  • Requires installation of dependencies including torch, transformers, and accelerate.
  • The Wanda implementation requires cloning and installing a separate GitHub repository.
  • A small calibration dataset is needed for the Wanda pruning function to collect activation statistics.

Tags

🛡️

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust HubPass
SocketPass
SnykPass

Details

Version
vlatest
License
Author
davila7
Installs
169

🌐 Community

Passed automated security scans.