Data Engineer
Automates data pipeline creation, transformation, and loading from diverse sources into usable formats.
Install on your platform
We auto-selected Claude Code based on this skill’s supported platforms.
Run in terminal (recommended)
claude mcp add data-engineer npx -- -y @trustedskills/data-engineer
Or manually add to ~/.claude/settings.json
{
"mcpServers": {
"data-engineer": {
"command": "npx",
"args": [
"-y",
"@trustedskills/data-engineer"
]
}
}
}Requires Claude Code (claude CLI). Run claude --version to verify your install.
About This Skill
What it does
This skill transforms an AI agent into a specialized data engineer, capable of designing and implementing scalable data pipelines and modern data architectures. It focuses on building robust solutions encompassing batch and streaming processing, data warehousing, lakehouse architectures, and cloud-native data services. The skill guides the agent through defining sources, choosing appropriate tools, implementing transformations, and monitoring pipeline performance to ensure reliability and cost-effectiveness.
When to use it
- Designing new batch or streaming data pipelines.
- Building a data warehouse or lakehouse architecture for your organization.
- Implementing data quality checks, lineage tracking, or governance procedures within your data infrastructure.
Key capabilities
- Modern Data Stack Expertise: Proficiency in technologies like Delta Lake, Apache Iceberg, Apache Hudi, Snowflake, BigQuery, Redshift, Databricks SQL, AWS S3, Azure Data Lake, Google Cloud Storage, Fivetran/Airbyte, dbt, and BI tools.
- Data Mesh Architecture: Understanding of domain-driven data ownership principles.
- Real-time Analytics: Experience with Apache Pinot, ClickHouse, and Apache Druid for real-time analytics solutions.
- Batch Processing & ETL/ELT: Expertise in Apache Spark 4.0, dbt Core/Cloud, and Apache Airflow.
Example prompts
- "Design a data pipeline to ingest website clickstream data into Snowflake using Airflow."
- "Help me build a data lakehouse architecture on AWS S3 with Delta Lake for storing customer transaction data."
- "Implement dbt transformations to standardize product category names in our BigQuery dataset."
Tips & gotchas
- Data Contracts are Key: Clearly define data sources, service level agreements (SLAs), and data contracts before starting any pipeline design.
- Security First: Always prioritize protecting Personally Identifiable Information (PII) and enforce least-privilege access controls.
- Validation is Crucial: Validate data before writing it to production storage sinks to ensure data quality.
Tags
TrustedSkills Verification
Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.
Security Audits
| Gen Agent Trust Hub | Pass |
| Socket | Pass |
| Snyk | Pass |
🌐 Community
Passed automated security scans.