Skypilot Multi Cloud Orchestration

🌐Community
by orchestra-research · vlatest · Repository

Skypilot Multi Cloud Orchestration automates deployments across various clouds, simplifying complex workflows and boosting application portability & efficiency.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

1

Run in terminal (recommended)

terminal
claude mcp add orchestra-research-skypilot-multi-cloud-orchestration npx -- -y @trustedskills/orchestra-research-skypilot-multi-cloud-orchestration
2

Or manually add to ~/.claude/settings.json

~/.claude/settings.json
{
  "mcpServers": {
    "orchestra-research-skypilot-multi-cloud-orchestration": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/orchestra-research-skypilot-multi-cloud-orchestration"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

Skypilot Multi Cloud Orchestration automates deployments and management of machine learning (ML) workloads across a wide range of cloud providers, including AWS, GCP, Azure, Kubernetes, Lambda, and RunPod. It simplifies complex workflows by providing a unified interface for over 20 different cloud platforms while optimizing costs through automatic selection of the cheapest available region or cloud. The skill also supports long-running jobs on spot instances with automated recovery capabilities.

When to use it

  • You need to run ML workloads across multiple clouds (AWS, GCP, Azure).
  • Cost optimization is a priority and you want automatic cloud/region selection.
  • You're running long jobs that benefit from using spot instances with auto-recovery.
  • You require management of distributed multi-node training environments.
  • You want to avoid vendor lock-in by distributing workloads across various providers.

Key capabilities

  • Multi-cloud support: Works with AWS, GCP, Azure, Kubernetes, Lambda, RunPod and 20+ other cloud providers.
  • Cost optimization: Automatically selects the cheapest available cloud region.
  • Spot instance management: Uses spot instances for cost savings while providing automatic recovery from interruptions.
  • Distributed training: Supports multi-node jobs with gang scheduling.
  • Managed jobs: Provides auto-recovery, checkpointing, and fault tolerance features.
  • Model serving (Sky Serve): Enables model deployment with autoscaling capabilities.

Example prompts

  • "Launch a cluster on the cheapest available cloud region with 1 T4 GPU."
  • "Run my training script (train.py) using spot instances and automatically recover if interrupted."
  • "Deploy my model as a serving endpoint with autoscaling enabled."

Tips & gotchas

  • You'll need to install the SkyPilot package: pip install "skypilot[aws,gcp,azure,kubernetes]".
  • Ensure you have valid cloud credentials configured using sky check before launching any resources.
  • The skill utilizes YAML files (hello.yaml, task YAML) to define resource requirements and execution commands; familiarity with this format is helpful.

Tags

🛡️

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust HubPass
SocketPass
SnykPass

Details

Version
vlatest
License
Author
orchestra-research
Installs
27

🌐 Community

Passed automated security scans.