Data Pipeline Engineer

🌐Community
by erichowens · vlatest · Repository

Helps with data, pipeline automation as part of automating DevOps pipelines and CI/CD workflows workflows.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

1

Run in terminal (recommended)

terminal
claude mcp add data-pipeline-engineer npx -- -y @trustedskills/data-pipeline-engineer
2

Or manually add to ~/.claude/settings.json

~/.claude/settings.json
{
  "mcpServers": {
    "data-pipeline-engineer": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/data-pipeline-engineer"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill enables AI agents to act as expert data engineers specializing in building and optimizing data pipelines. It assists with tasks ranging from identifying data sources and choosing appropriate architectures (like Medallion, Lambda, or Kappa) to designing layers for data transformation and implementing quality gates. The skill leverages technologies like Spark, Kafka, Airflow, dbt, Great Expectations, and others commonly used in modern data stacks.

When to use it

  • You need help designing a new ETL/ELT pipeline from raw data sources to business-ready datasets.
  • You're struggling to choose between Lambda and Kappa architectures for your streaming data processing needs.
  • You want assistance structuring a dbt project with staging, intermediate, and mart layers.
  • You need guidance on implementing data quality checks using Great Expectations or dbt tests within a pipeline.
  • You are building an Airflow DAG for orchestrating data transformations and require help with sensors and task groups.

Key capabilities

  • Batch Processing: Expertise in Spark, dbt, and Databricks for incremental processing, partitioning, and Delta/Iceberg table formats.
  • Stream Processing: Proficiency in Kafka, Flink, and Spark Streaming for watermarks, exactly-once semantics, and windowing operations.
  • Orchestration: Knowledge of Airflow, Dagster, and Prefect for DAG design, sensor implementation, and task grouping.
  • Data Modeling: Experience with Kimball, Data Vault, and Slowly Changing Dimensions (SCD) modeling techniques using dbt and SQL.
  • Data Quality: Ability to implement validation suites and freshness checks using Great Expectations and dbt tests.

Example prompts

  • "I need help designing a Medallion architecture for ingesting data from Salesforce into Snowflake."
  • "What are the pros and cons of Lambda versus Kappa architectures for real-time fraud detection?"
  • "Can you show me an example Airflow DAG with sensors to monitor file arrival and trigger downstream tasks?"
  • “I want to implement a dbt project. What is a good directory structure?”

Tips & gotchas

  • The skill excels at providing guidance on architectural decisions and best practices for data engineering, rather than generating complete code implementations directly (though reference examples are provided).
  • Familiarity with core data engineering concepts like ETL/ELT, streaming architectures, and data warehousing is helpful to effectively utilize the skill.
  • Explore the ./references/ directory within the skill's files for detailed implementation examples of dbt projects, Airflow DAGs, and Spark Streaming jobs.

Tags

🛡️

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust HubPass
SocketPass
SnykPass

Details

Version
vlatest
License
Author
erichowens
Installs
39

🌐 Community

Passed automated security scans.