Stata Data Cleaning

🌐Community
by meleantonio · vlatest · Repository

Automates repetitive Stata data cleaning tasks like handling missing values, outliers, and variable transformations for efficient analysis.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

1

Run in terminal (recommended)

terminal
claude mcp add stata-data-cleaning npx -- -y @trustedskills/stata-data-cleaning
2

Or manually add to ~/.claude/settings.json

~/.claude/settings.json
{
  "mcpServers": {
    "stata-data-cleaning": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/stata-data-cleaning"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill enables AI agents to perform data cleaning tasks directly within the Stata statistical software environment. It automates the preparation of datasets for econometric analysis by handling missing values, recoding variables, and managing file formats specific to Stata's .dta or .do syntax.

When to use it

  • You need to preprocess raw survey data before running regression models in Stata.
  • Your dataset contains inconsistent variable names or types that require standardization for analysis.
  • You want to automate repetitive cleaning steps like dropping outliers or imputing missing values across large datasets.
  • You are working with legacy Stata files and need to convert them into a clean, analysis-ready format.

Key capabilities

  • Executes Stata commands for data manipulation and transformation.
  • Handles missing value identification and treatment strategies.
  • Manages variable renaming and recoding within the Stata environment.
  • Generates cleaned dataset files compatible with standard econometric workflows.

Example prompts

  • "Clean this raw survey dataset by removing rows with missing income values and recoding gender into binary numeric variables."
  • "Load my .dta file, drop observations where age is below 18 or above 90, and save the result as cleaned_data.dta."
  • "Standardize variable names in this dataset to snake_case format and create a summary table of missing values for each column."

Tips & gotchas

Ensure your source data is properly formatted for Stata ingestion before requesting cleaning operations. Complex logic involving conditional imputation may require iterative prompting or custom .do file generation rather than single-command execution.

Tags

🛡️

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust HubPass
SocketPass
SnykPass

Details

Version
vlatest
License
Author
meleantonio
Installs
35

🌐 Community

Passed automated security scans.