60_Mlflow_For_Mlops

Category: AI & Data Science Tools
Type: AI/ML Tool or Library
Generated on: 2025-08-26 11:11:16
For: Data Science, Machine Learning & Technical Interviews

MLflow Cheatsheet for MLOps

1. Tool/Library Overview

MLflow is an open-source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and central model registry.

Main Use Cases:

Tracking: Logging parameters, metrics, artifacts (models, data) during experiments.
Projects: Packaging ML code in a reproducible format.
Models: Managing and deploying models using various frameworks.
Registry: Centralized model store for managing model versions, stages, and transitions.
Deployment: Deploying models to various platforms (local, cloud, Kubernetes, etc.).

2. Installation & Setup

Installation:

pip install mlflow

Setup (Local Tracking Server):

mlflow ui

This starts a local MLflow tracking UI accessible in your browser (usually at http://localhost:5000).

Setup (Tracking Server with Database):

Choose a database (e.g., PostgreSQL, MySQL).
Create a database and user.
Start MLflow with the database URI:

mlflow server \
    --backend-store-uri postgresql://user:password@host:port/database \
    --default-artifact-root s3://your-s3-bucket/mlflow-artifacts  \
    --host 0.0.0.0 # Optional: for remote access

Environment Variables (Authentication):

For cloud storage artifact repositories (e.g., S3, Azure Blob Storage, GCS), set the appropriate environment variables for authentication. Examples:

AWS: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
Azure: AZURE_STORAGE_ACCOUNT_NAME, AZURE_STORAGE_ACCOUNT_KEY
GCP: GOOGLE_APPLICATION_CREDENTIALS

3. Core Features & API

3.1. Tracking

Key Functions:

mlflow.start_run(): Starts a new MLflow run. Returns a Run object.
mlflow.end_run(): Ends the current MLflow run. Automatically called when using with mlflow.start_run():
mlflow.log_param(key, value): Logs a single parameter.
mlflow.log_metric(key, value, step=None): Logs a single metric. step is optional for tracking metrics over time.
mlflow.log_artifact(local_path, artifact_path=None): Logs a file as an artifact.
mlflow.log_artifacts(local_dir, artifact_path=None): Logs an entire directory as artifacts
mlflow.set_tag(key, value): Sets a tag for the run.
mlflow.set_experiment(experiment_name): Sets the active experiment. If the experiment does not exist, it will be created.
mlflow.active_run(): Returns the currently active run object

Classes:

mlflow.Run: Represents an MLflow run. Accessible via mlflow.active_run().

Parameters:

mlflow.start_run(run_name=None, experiment_id=None, nested=False):
- run_name: A human-readable name for the run.
- experiment_id: The ID of the experiment to associate the run with.
- nested: Whether the run is a nested run (e.g., for hyperparameter search).

3.2. Projects

Key Functions:

mlflow.projects.run(uri, entry_point='main', version=None, parameters=None, backend='local', backend_config=None, env_manager='local'): Runs an MLflow project.
- uri: The URI of the project (local path, Git repository, etc.).
- entry_point: The entry point to execute (e.g., ‘main’).
- parameters: A dictionary of parameters to pass to the entry point.
- backend: The backend to use (e.g., ‘local’, ‘databricks’, ‘kubernetes’).
- backend_config: Backend-specific configuration (e.g., Databricks cluster ID).
- env_manager: How to manage the environment (e.g., ‘local’, ‘conda’, ‘virtualenv’).

3.3. Models

Key Functions:

mlflow.sklearn.log_model(sk_model, artifact_path, conda_env=None, signature=None, input_example=None, registered_model_name=None): Logs a scikit-learn model. Similar functions exist for other frameworks (e.g., mlflow.tensorflow.log_model, mlflow.pytorch.log_model).
mlflow.sklearn.load_model(model_uri): Loads a scikit-learn model.
mlflow.register_model(model_uri, name): Registers a model in the MLflow Model Registry.
mlflow.pyfunc.log_model(python_model=None, artifact_path=None, conda_env=None, code_path=None, loader_module=None, signature=None, input_example=None, registered_model_name=None): logs a generic Python function as a model
mlflow.pyfunc.load_model(model_uri): Loads a generic Python function model.
mlflow.models.infer_signature(model_input, model_output=None, params=None): Infers the model signature, which describes the inputs and outputs of a model.

Parameters:

artifact_path: The path within the run to store the model.
conda_env: The path to a conda.yaml file specifying the environment. If None, MLflow will create a default environment.
signature: An mlflow.models.ModelSignature object describing the model’s inputs and outputs.
input_example: An example input to the model, used for signature inference and deployment.
registered_model_name: The name to register the model under in the Model Registry.

3.4. Registry

Key Functions:

mlflow.register_model(model_uri, name): Registers a model in the MLflow Model Registry.
mlflow.models.get_latest_versions(name, stages=None): Retrieves the latest versions of a registered model for specified stages.
mlflow.transition_model_version_stage(name, version, stage, archive_existing_versions=False): Transitions a model version to a new stage (e.g., ‘Staging’, ‘Production’).
mlflow.get_registry_uri(): Returns the URI of the model registry.

Stages:

None: Initial state of a model version.
Staging: Model version is being tested.
Production: Model version is deployed and serving live traffic.
Archived: Model version is no longer in use.

3.5 Deployment

MLflow can deploy to various platforms, including local, cloud (AWS, Azure, GCP), Kubernetes, and more.
Use the mlflow models serve command to serve a model locally for testing.
MLflow provides tools and integrations for building Docker images and deploying to cloud platforms using their respective SDKs or CLIs.
MLflow also integrates with Kubernetes for deploying models as microservices.

4. Practical Examples

4.1. Basic Tracking Example

import mlflow
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Set the experiment name (optional)
mlflow.set_experiment("Linear Regression Experiment")

with mlflow.start_run() as run:
    # Log parameters
    alpha = 0.1
    mlflow.log_param("alpha", alpha)

    # Generate some sample data
    X = np.random.rand(100, 1)
    y = 2 * X + 1 + 0.1 * np.random.randn(100, 1)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Train a linear regression model
    model = LinearRegression()
    model.fit(X_train, y_train)

    # Make predictions
    predictions = model.predict(X_test)

    # Calculate and log metrics
    rmse = np.sqrt(mean_squared_error(y_test, predictions))
    mlflow.log_metric("rmse", rmse)

    # Log the model
    mlflow.sklearn.log_model(model, "model")

    # Log some sample data as an artifact
    np.savetxt("sample_data.txt", np.concatenate((X_train, y_train), axis=1))
    mlflow.log_artifact("sample_data.txt")

    print(f"MLflow run completed with run_id {run.info.run_id}")

Expected Output:

The code will train a linear regression model, log parameters (alpha), a metric (RMSE), the model itself, and a data file as artifacts. You can then view the results in the MLflow UI. The print statement will output the ID of the MLflow run.

4.2. MLflow Project Example (with `MLproject` file)

Create a file named MLproject:

name: My ML Project
conda_env: conda.yaml
entry_points:
  main:
    command: "python train.py --alpha {alpha}"
    parameters:
      alpha: {type: float, default: 0.1}

Create a file named train.py:

import mlflow
import argparse
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

def train(alpha):
    with mlflow.start_run():
        mlflow.log_param("alpha", alpha)

        # Generate some sample data
        X = np.random.rand(100, 1)
        y = 2 * X + 1 + 0.1 * np.random.randn(100, 1)
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

        # Train a linear regression model
        model = LinearRegression()
        model.fit(X_train, y_train)

        # Make predictions
        predictions = model.predict(X_test)

        # Calculate and log metrics
        rmse = np.sqrt(mean_squared_error(y_test, predictions))
        mlflow.log_metric("rmse", rmse)

        # Log the model
        mlflow.sklearn.log_model(model, "model")

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--alpha", type=float, default=0.1)
    args = parser.parse_args()
    train(args.alpha)

Create a conda.yaml file:

name: mlproject-env
channels:
  - conda-forge
dependencies:
  - python=3.8
  - scikit-learn
  - mlflow
  - numpy

Run the project:

mlflow run . --param alpha=0.2

Expected Output:

This will run the train.py script as an MLflow project, logging the specified parameter (alpha=0.2) and the trained model. The MLproject file defines the project structure, dependencies, and entry point.

4.3. Registering and Transitioning a Model

import mlflow

# Assume you have a trained model logged at a specific URI
model_uri = "runs:/<run_id>/model"  # Replace <run_id> with the actual run ID

# Register the model
model_name = "MyLinearRegressionModel"
try:
    result = mlflow.register_model(model_uri, model_name)
    print(f"Successfully registered model '{model_name}' with version {result.version}")
except mlflow.exceptions.MlflowException as e:
    if "already exists" in str(e):
        print(f"Model '{model_name}' already exists.  Skipping registration.")
    else:
        raise e


# Transition the model to the 'Staging' stage
client = mlflow.MlflowClient()
model_version = 1 # Replace with the actual model version
client.transition_model_version_stage(
    name=model_name,
    version=model_version,
    stage="Staging",
    archive_existing_versions=False, # Keep existing staging versions (optional)
)

print(f"Model version {model_version} transitioned to 'Staging' stage.")

# Load the model from the registry (Staging version)
loaded_model = mlflow.pyfunc.load_model(f"models:/{model_name}/Staging")

# Example usage with input data (replace with your actual data)
import pandas as pd
data = pd.DataFrame([[0.5]])
predictions = loaded_model.predict(data)
print(f"Predictions from Staging model: {predictions}")


# Transition to Production after testing
#client.transition_model_version_stage(
#    name=model_name,
#    version=model_version,
#    stage="Production",
#    archive_existing_versions=True, # Archive the previous production model
#)
#print(f"Model version {model_version} transitioned to 'Production' stage.")

Expected Output:

This code registers a model, transitions it to the ‘Staging’ stage, loads the model from the registry, and makes a prediction. The transition_model_version_stage function moves the model version to a new stage, enabling controlled deployment.

4.4. Deploying a Model Locally

mlflow models serve --model-uri runs:/<run_id>/model --port 5000 --host 0.0.0.0

Replace <run_id> with the actual run ID. This command starts a local REST server that serves the model on port 5000. You can then send requests to the server to get predictions.

Example Request (using curl):

curl -X POST -H "Content-Type: application/json" -d '{"dataframe_records": [[0.5]]}' http://localhost:5000/invocations

Expected Output:

A JSON response containing the model’s prediction for the input data. The exact output will depend on your model.

5. Advanced Usage

5.1. Custom Python Model (PyFunc)

import mlflow
import mlflow.pyfunc
import pandas as pd
import numpy as np

class MyCustomModel(mlflow.pyfunc.PythonModel):
    def __init__(self, scaler):
        self.scaler = scaler

    def predict(self, context, model_input):
        scaled_input = self.scaler.transform(model_input)
        return np.sum(scaled_input, axis=1)  # Example prediction logic

    def load_context(self, context):
      # this is where you can load dependencies if you need to
      return

import pickle
from sklearn.preprocessing import StandardScaler

# Train a scaler
X = np.random.rand(100, 2)
scaler = StandardScaler()
scaler.fit(X)

# Save the scaler
with open("scaler.pkl", "wb") as f:
    pickle.dump(scaler, f)

# Create a custom model instance
custom_model = MyCustomModel(scaler)

# Log the custom model
with mlflow.start_run() as run:

    # Define the model signature
    input_example = pd.DataFrame(np.random.rand(5, 2))
    signature = mlflow.models.infer_signature(input_example)

    # Define required packages
    conda_env = {
        "channels": ["conda-forge"],
        "dependencies": [
            "python=3.8",
            "scikit-learn",
            "pandas",
            "numpy",
            "pip",
            {"pip": ["mlflow"]}
        ],
        "name": "mlflow-env"
    }

    mlflow.pyfunc.log_model(
        python_model=custom_model,
        artifact_path="custom_model",
        conda_env=conda_env,
        signature=signature,
        input_example=input_example,
        code_path = ["./"] # required if the model class is in the same directory
    )

    model_uri = mlflow.get_artifact_uri("custom_model")
    print(f"Logged custom model to: {model_uri}")

    # Load the model
    loaded_model = mlflow.pyfunc.load_model(model_uri)

    # Make predictions
    input_data = pd.DataFrame(np.random.rand(5, 2))
    predictions = loaded_model.predict(input_data)
    print(f"Predictions: {predictions}")

Explanation:

mlflow.pyfunc.PythonModel: Base class for custom Python models.
predict(context, model_input): The prediction function. context provides access to artifacts and dependencies.
load_context(self, context): Loads dependencies, if any.
conda_env: Defines the environment for the model.
code_path: Specifies any additional code files required for the model. Crucial when your model class is defined in a separate file.
artifact_path: the directory in the run to which this model is saved
input_example: Example input for the model to infer the signature.

5.2. Hyperparameter Tuning with Nested Runs

import mlflow
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

def train_ridge(alpha):
    with mlflow.start_run(nested=True) as run:
        mlflow.log_param("alpha", alpha)
        # Generate some sample data
        X = np.random.rand(100, 1)
        y = 2 * X + 1 + 0.1 * np.random.randn(100, 1)
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

        # Train a ridge regression model
        model = Ridge(alpha=alpha)
        model.fit(X_train, y_train)

        # Make predictions
        predictions = model.predict(X_test)

        # Calculate and log metrics
        rmse = np.sqrt(mean_squared_error(y_test, predictions))
        mlflow.log_metric("rmse", rmse)

        # Log the model
        mlflow.sklearn.log_model(model, "model")
        return rmse

with mlflow.start_run() as parent_run:
    alphas = [0.01, 0.1, 1.0]
    best_rmse = float('inf')
    best_alpha = None

    for alpha in alphas:
        rmse = train_ridge(alpha)
        if rmse < best_rmse:
            best_rmse = rmse
            best_alpha = alpha

    mlflow.log_metric("best_rmse", best_rmse)
    mlflow.log_param("best_alpha", best_alpha)
    print(f"Best alpha: {best_alpha}, Best RMSE: {best_rmse}")

Explanation:

nested=True: Indicates that the run is nested under a parent run. Useful for hyperparameter search and other iterative processes.
The parent run logs the best hyperparameters and metrics from the nested runs.

5.3. Using Model Signatures

import mlflow
import pandas as pd
from sklearn.linear_model import LinearRegression
from mlflow.models.signature import infer_signature

# Sample data
data = pd.DataFrame({'feature1': [1, 2, 3], 'feature2': [4, 5, 6], 'target': [7, 8, 9]})
X = data[['feature1', 'feature2']]
y = data['target']

# Train a model
model = LinearRegression()
model.fit(X, y)

# Infer the signature
signature = infer_signature(X, y)

# Log the model with the signature
with mlflow.start_run():
    mlflow.sklearn.log_model(model, "model", signature=signature)

# Load the model and check the signature
loaded_model = mlflow.pyfunc.load_model("runs:/{run_id}/model".format(run_id=mlflow.active_run().info.run_id))
print(loaded_model.metadata.signature)

Explanation:

infer_signature(inputs, outputs=None, params=None): Infers the model signature from the input and output data.
Signatures define the expected input and output types and names, improving model validation and deployment.

6. Tips & Tricks

Use mlflow.set_tag() for organizing runs: Tags can be used to categorize runs based on experiment type, data version, or other relevant information.
Use mlflow.log_artifacts() to log data: Log raw data, preprocessed data, or other relevant files as artifacts for reproducibility and debugging.
Use mlflow.get_run() to retrieve run information: Access parameters, metrics, tags, and artifacts associated with a specific run.
Use relative paths in mlflow.log_artifact() and mlflow.log_artifacts(): This makes the code more portable and independent of the current working directory.
Use the MLflow CLI for common tasks: The CLI provides convenient commands for running projects, serving models, and managing the Model Registry.
Ensure consistent environment management: Use Conda or virtualenv to create reproducible environments for your MLflow projects.
Leverage MLflow Recipes: MLflow Recipes offer a structured way to build production-ready ML pipelines with pre-built components and best practices.

7. Integration

Pandas: Pandas DataFrames are commonly used as input to MLflow models. The mlflow.models.infer_signature() function can automatically infer the signature from a DataFrame.
Matplotlib: Use mlflow.log_figure() to log Matplotlib plots as artifacts.
Scikit-learn: MLflow provides built-in support for logging and loading Scikit-learn models.
TensorFlow/Keras: MLflow provides built-in support for logging and loading TensorFlow and Keras models.
PyTorch: MLflow provides built-in support for logging and loading PyTorch models.
Spark: MLflow integrates with Spark for distributed training and deployment.

8. Further Resources

Official MLflow Documentation: https://www.mlflow.org/docs/latest/index.html
MLflow Examples: https://github.com/mlflow/mlflow/tree/master/examples
MLflow Recipes: https://mlflow.org/docs/latest/recipes/index.html
MLflow Blog: https://www.databricks.com/blog/tag/mlflow
Community Slack: https://mlflow.org/community/

60_Mlflow_For_Mlops

MLflow Cheatsheet for MLOps

1. Tool/Library Overview

2. Installation & Setup

3. Core Features & API

3.1. Tracking

3.2. Projects

3.3. Models

3.4. Registry

3.5 Deployment

4. Practical Examples

4.1. Basic Tracking Example

4.2. MLflow Project Example (with MLproject file)

4.3. Registering and Transitioning a Model

4.4. Deploying a Model Locally

5. Advanced Usage

5.1. Custom Python Model (PyFunc)

5.2. Hyperparameter Tuning with Nested Runs

5.3. Using Model Signatures

6. Tips & Tricks

7. Integration

8. Further Resources

4.2. MLflow Project Example (with `MLproject` file)