Spark Python Data Source

🌐Community
by databricks-solutions Β· vlatest Β· Repository

Helps with Python, data as part of developing backend services and APIs workflows.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

1

Run in terminal (recommended)

terminal
claude mcp add spark-python-data-source npx -- -y @trustedskills/spark-python-data-source
2

Or manually add to ~/.claude/settings.json

~/.claude/settings.json
{
  "mcpServers": {
    "spark-python-data-source": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/spark-python-data-source"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill allows AI agents to interact with Apache Spark data sources using Python. It enables retrieving, transforming, and analyzing data stored within Spark environments, facilitating tasks like data exploration, report generation, and integration with other systems. The agent can execute Python code against a Spark cluster, leveraging its distributed processing capabilities.

When to use it

  • Data Analysis: An agent needs to analyze sales trends from a Spark-managed data warehouse.
  • Report Generation: Automatically generate daily reports based on data residing in a Spark cluster.
  • ETL Pipelines: Integrate data from external sources into a Spark-based ETL pipeline.
  • Data Exploration: Allow users to query and explore large datasets stored within a Spark environment.

Key capabilities

  • Python code execution against a Spark cluster.
  • Data retrieval from various Spark data sources.
  • Data transformation using Python libraries.
  • Integration with Spark's distributed processing framework.

Example prompts

  • "Run this Python script to calculate the average order value in our sales data stored in Spark."
  • "Generate a report showing daily active users, pulling data from the Spark analytics cluster."
  • β€œCan you extract the top 10 products by revenue from the Spark data warehouse and format them as a CSV?”

Tips & gotchas

  • Requires access to a configured Apache Spark environment.
  • Familiarity with Python and basic Spark concepts is helpful for crafting effective prompts.

Tags

πŸ›‘οΈ

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates β€” what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust HubPass
SocketPass
SnykPass

Details

Version
vlatest
License
Author
databricks-solutions
Installs
5

🌐 Community

Passed automated security scans.