Databricks Mlflow Evaluation

🌐Community
by databricks-solutions · vlatest · Repository

Evaluates ML model performance across environments using MLflow, streamlining deployment and ensuring consistent results for reliable AI systems.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

1

Run in terminal (recommended)

terminal
claude mcp add databricks-mlflow-evaluation npx -- -y @trustedskills/databricks-mlflow-evaluation
2

Or manually add to ~/.claude/settings.json

~/.claude/settings.json
{
  "mcpServers": {
    "databricks-mlflow-evaluation": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/databricks-mlflow-evaluation"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill allows AI agents to interact with Databricks MLflow evaluation metrics. It enables users to retrieve and analyze model performance data stored in MLflow, facilitating iterative improvements and comparisons between different models or experiments. Specifically, the agent can access and present evaluation results for machine learning models tracked within a Databricks MLflow environment.

When to use it

  • Model Selection: Compare the evaluation metrics of multiple trained models to determine which performs best on a given dataset.
  • Performance Monitoring: Track model performance over time and identify potential degradation requiring retraining.
  • Experiment Analysis: Analyze the impact of different hyperparameter settings or feature engineering techniques on model accuracy.
  • Debugging & Troubleshooting: Investigate unexpected results by examining evaluation metrics and identifying areas for improvement.

Key capabilities

  • Retrieval of MLflow evaluation metrics
  • Analysis of model performance data
  • Comparison of models and experiments

Example prompts

  • "What were the accuracy scores for experiment 'my_experiment'?"
  • "Compare the F1-score between model A and model B."
  • "Show me the loss values over time for my latest training run."

Tips & gotchas

  • Requires access to a Databricks workspace with MLflow tracking enabled.
  • The agent needs appropriate permissions within the Databricks environment to retrieve evaluation metrics.

Tags

🛡️

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust HubPass
SocketPass
SnykPass

Details

Version
vlatest
License
Author
databricks-solutions
Installs
5

🌐 Community

Passed automated security scans.