Databricks Mlflow Evaluation

Name: Databricks Mlflow Evaluation
Author: databricks-solutions

🌐Community

by databricks-solutions · vlatest · Repository

Evaluates ML model performance across environments using MLflow, streamlining deployment and ensuring consistent results for reliable AI systems.

Install on your platform

We auto-selected Claude Code based on this skill’s supported platforms.

Run in terminal (recommended)

terminal

claude mcp add databricks-mlflow-evaluation npx -- -y @trustedskills/databricks-mlflow-evaluation

Or manually add to ~/.claude/settings.json

~/.claude/settings.json

{
  "mcpServers": {
    "databricks-mlflow-evaluation": {
      "command": "npx",
      "args": [
        "-y",
        "@trustedskills/databricks-mlflow-evaluation"
      ]
    }
  }
}

Requires Claude Code (claude CLI). Run claude --version to verify your install.

About This Skill

What it does

This skill allows AI agents to interact with Databricks MLflow evaluation metrics. It enables users to retrieve and analyze model performance data stored in MLflow, facilitating iterative improvements and comparisons between different models or experiments. Specifically, the agent can access and present evaluation results for machine learning models tracked within a Databricks MLflow environment.

When to use it

Model Selection: Compare the evaluation metrics of multiple trained models to determine which performs best on a given dataset.
Performance Monitoring: Track model performance over time and identify potential degradation requiring retraining.
Experiment Analysis: Analyze the impact of different hyperparameter settings or feature engineering techniques on model accuracy.
Debugging & Troubleshooting: Investigate unexpected results by examining evaluation metrics and identifying areas for improvement.

Key capabilities

Retrieval of MLflow evaluation metrics
Analysis of model performance data
Comparison of models and experiments

Example prompts

"What were the accuracy scores for experiment 'my_experiment'?"
"Compare the F1-score between model A and model B."
"Show me the loss values over time for my latest training run."

Tips & gotchas

Requires access to a Databricks workspace with MLflow tracking enabled.
The agent needs appropriate permissions within the Databricks environment to retrieve evaluation metrics.

View Repository →

TrustedSkills Verification

Unlike other registries that point to live repositories, TrustedSkills pins every skill to a verified commit hash. This protects you from malicious updates — what you install today is exactly what was reviewed and verified.

Security Audits

Gen Agent Trust Hub	Pass
Socket	Pass
Snyk	Pass

Details

Version: vlatest
License
Author: databricks-solutions
Installs: 5

Repository (canonical source) →

🌐 Community

Passed automated security scans.