Data Engineer

🌐Community
by sickn33 · vlatest · Repository

Automates data pipeline creation, transformation, and loading from diverse sources into usable formats.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

1

Run in terminal (recommended)

terminal
claude mcp add data-engineer npx -- -y @trustedskills/data-engineer
2

Or manually add to ~/.claude/settings.json

~/.claude/settings.json
{
  "mcpServers": {
    "data-engineer": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/data-engineer"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill transforms an AI agent into a specialized data engineer, capable of designing and implementing scalable data pipelines and modern data architectures. It focuses on building robust solutions encompassing batch and streaming processing, data warehousing, lakehouse architectures, and cloud-native data services. The skill guides the agent through defining sources, choosing appropriate tools, implementing transformations, and monitoring pipeline performance to ensure reliability and cost-effectiveness.

When to use it

  • Designing new batch or streaming data pipelines.
  • Building a data warehouse or lakehouse architecture for your organization.
  • Implementing data quality checks, lineage tracking, or governance procedures within your data infrastructure.

Key capabilities

  • Modern Data Stack Expertise: Proficiency in technologies like Delta Lake, Apache Iceberg, Apache Hudi, Snowflake, BigQuery, Redshift, Databricks SQL, AWS S3, Azure Data Lake, Google Cloud Storage, Fivetran/Airbyte, dbt, and BI tools.
  • Data Mesh Architecture: Understanding of domain-driven data ownership principles.
  • Real-time Analytics: Experience with Apache Pinot, ClickHouse, and Apache Druid for real-time analytics solutions.
  • Batch Processing & ETL/ELT: Expertise in Apache Spark 4.0, dbt Core/Cloud, and Apache Airflow.

Example prompts

  • "Design a data pipeline to ingest website clickstream data into Snowflake using Airflow."
  • "Help me build a data lakehouse architecture on AWS S3 with Delta Lake for storing customer transaction data."
  • "Implement dbt transformations to standardize product category names in our BigQuery dataset."

Tips & gotchas

  • Data Contracts are Key: Clearly define data sources, service level agreements (SLAs), and data contracts before starting any pipeline design.
  • Security First: Always prioritize protecting Personally Identifiable Information (PII) and enforce least-privilege access controls.
  • Validation is Crucial: Validate data before writing it to production storage sinks to ensure data quality.

Tags

🛡️

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust HubPass
SocketPass
SnykPass

Details

Version
vlatest
License
Author
sickn33
Installs
121

🌐 Community

Passed automated security scans.