Devops Troubleshooter

🌐Community
by sickn33 Β· vlatest Β· Repository

Diagnoses and resolves common DevOps pipeline errors, infrastructure issues, and deployment failures quickly.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

1

Run in terminal (recommended)

terminal
claude mcp add devops-troubleshooter npx -- -y @trustedskills/devops-troubleshooter
2

Or manually add to ~/.claude/settings.json

~/.claude/settings.json
{
  "mcpServers": {
    "devops-troubleshooter": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/devops-troubleshooter"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

The Devops Troubleshooter skill enables AI agents to diagnose and resolve common DevOps pipeline errors, infrastructure issues, and deployment failures. It specializes in rapid incident response, advanced debugging, and modern observability practices, leveraging expertise in log analysis, distributed tracing, and system reliability engineering. The agent can provide actionable steps for problem resolution and root cause analysis, drawing upon a comprehensive knowledge base of tools and techniques.

When to use it

  • When facing DevOps pipeline errors or deployment failures that require rapid troubleshooting.
  • To receive guidance on best practices and checklists for resolving DevOps issues.
  • During incident response scenarios where quick problem resolution is critical.

Key capabilities

  • Modern Observability & Monitoring: Expertise with logging platforms (ELK Stack, Loki/Grafana, Fluentd), APM solutions (DataDog, New Relic), metrics tools (Prometheus, Grafana), and distributed tracing systems (Jaeger, Zipkin).
  • Container & Kubernetes Debugging: Proficiency in kubectl commands, container runtime debugging (Docker, containerd), pod troubleshooting, service mesh debugging (Istio, Linkerd), and Kubernetes networking.
  • Network & DNS Troubleshooting: Ability to analyze network traffic with tools like tcpdump and debug DNS propagation issues.
  • Performance & Resource Analysis: Expertise in identifying performance bottlenecks related to CPU, memory, disk I/O, and database query optimization.

Example prompts

  • "I'm seeing high latency on our API endpoint. Can you help me troubleshoot?"
  • "My Kubernetes pod is stuck in a CrashLoopBackOff state. What are the likely causes and how can I fix it?"
  • β€œWhat’s the best way to debug network connectivity issues between my microservices?”

Tips & gotchas

  • Clearly define the goals, constraints, and required inputs for the troubleshooting task.
  • For more detailed guidance or implementation examples, consult the resources/implementation-playbook.md file.
  • The skill is specifically focused on DevOps troubleshooting; tasks outside this scope are not within its expertise.

Tags

πŸ›‘οΈ

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates β€” what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust HubPass
SocketPass
SnykPass

Details

Version
vlatest
License
Author
sickn33
Installs
115

🌐 Community

Passed automated security scans.