Trustworthy Experiments

Name: Trustworthy Experiments
Author: pmprompt

🌐Community

by pmprompt · vlatest · Repository

This skill guides you to design and execute reliable experiments by prioritizing factual accuracy and minimizing bias for robust results.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

Run in terminal (recommended)

terminal

claude mcp add trustworthy-experiments npx -- -y @trustedskills/trustworthy-experiments

Or manually add to ~/.claude/settings.json

~/.claude/settings.json

{
  "mcpServers": {
    "trustworthy-experiments": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/trustworthy-experiments"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

Trustworthy Experiments is a framework designed to guide you through running controlled experiments (A/B tests) that yield reliable and actionable results. It addresses common pitfalls in experimentation, aiming to minimize false positives and ensure the validity of findings. The skill draws on industry best practices and insights from experts like Ronny Kohavi, who has identified significant rates of experiment failure and invalid results due to factors like sample ratio mismatch.

When to use it

Designing A/B tests for product features or improvements.
Calculating appropriate sample sizes and experiment runtime for statistical power.
Validating the results of experiments before making decisions about implementation.
Diagnosing unexpected or suspicious experimental outcomes (Twyman's Law).
Selecting metrics that balance short-term gains with long-term value (OEC).

Key capabilities

Provides a framework for designing and executing reliable A/B tests.
Helps determine appropriate sample sizes and experiment duration.
Offers guidance on validating experimental results to avoid false positives.
Addresses the importance of choosing balanced metrics (OEC).
Supports diagnosing issues with experiment outcomes using Twyman's Law.

Example prompts

"Help me design an A/B test for [feature name] and determine the necessary sample size."
"I ran an experiment on [metric], but the results seem suspicious. Can you help me diagnose what might be wrong?"
"What metrics should I use to evaluate the success of [product change]?"

Tips & gotchas

Requires a sufficient user base (tens of thousands minimum) for valid results.
Controlled experiments are not appropriate for one-time decisions or situations lacking user choice.
Experiments require time; they aren't suitable for immediate decision-making.

View Repository →

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust Hub	Pass
Socket	Pass
Snyk	Pass

Details

Version: vlatest
License
Author: pmprompt
Installs: 48

Repository (canonical source) →

🌐 Community

Passed automated security scans.