Skip to content

60_Mlflow_For_Mlops

Category: AI & Data Science Tools
Type: AI/ML Tool or Library
Generated on: 2025-08-26 11:11:16
For: Data Science, Machine Learning & Technical Interviews


MLflow is an open-source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and central model registry.

Main Use Cases:

  • Tracking: Logging parameters, metrics, artifacts (models, data) during experiments.
  • Projects: Packaging ML code in a reproducible format.
  • Models: Managing and deploying models using various frameworks.
  • Registry: Centralized model store for managing model versions, stages, and transitions.
  • Deployment: Deploying models to various platforms (local, cloud, Kubernetes, etc.).

Installation:

Terminal window
pip install mlflow

Setup (Local Tracking Server):

Terminal window
mlflow ui

This starts a local MLflow tracking UI accessible in your browser (usually at http://localhost:5000).

Setup (Tracking Server with Database):

  1. Choose a database (e.g., PostgreSQL, MySQL).
  2. Create a database and user.
  3. Start MLflow with the database URI:
Terminal window
mlflow server \
--backend-store-uri postgresql://user:password@host:port/database \
--default-artifact-root s3://your-s3-bucket/mlflow-artifacts \
--host 0.0.0.0 # Optional: for remote access

Environment Variables (Authentication):

For cloud storage artifact repositories (e.g., S3, Azure Blob Storage, GCS), set the appropriate environment variables for authentication. Examples:

  • AWS: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
  • Azure: AZURE_STORAGE_ACCOUNT_NAME, AZURE_STORAGE_ACCOUNT_KEY
  • GCP: GOOGLE_APPLICATION_CREDENTIALS

Key Functions:

  • mlflow.start_run(): Starts a new MLflow run. Returns a Run object.
  • mlflow.end_run(): Ends the current MLflow run. Automatically called when using with mlflow.start_run():
  • mlflow.log_param(key, value): Logs a single parameter.
  • mlflow.log_metric(key, value, step=None): Logs a single metric. step is optional for tracking metrics over time.
  • mlflow.log_artifact(local_path, artifact_path=None): Logs a file as an artifact.
  • mlflow.log_artifacts(local_dir, artifact_path=None): Logs an entire directory as artifacts
  • mlflow.set_tag(key, value): Sets a tag for the run.
  • mlflow.set_experiment(experiment_name): Sets the active experiment. If the experiment does not exist, it will be created.
  • mlflow.active_run(): Returns the currently active run object

Classes:

  • mlflow.Run: Represents an MLflow run. Accessible via mlflow.active_run().

Parameters:

  • mlflow.start_run(run_name=None, experiment_id=None, nested=False):
    • run_name: A human-readable name for the run.
    • experiment_id: The ID of the experiment to associate the run with.
    • nested: Whether the run is a nested run (e.g., for hyperparameter search).

Key Functions:

  • mlflow.projects.run(uri, entry_point='main', version=None, parameters=None, backend='local', backend_config=None, env_manager='local'): Runs an MLflow project.
    • uri: The URI of the project (local path, Git repository, etc.).
    • entry_point: The entry point to execute (e.g., ‘main’).
    • parameters: A dictionary of parameters to pass to the entry point.
    • backend: The backend to use (e.g., ‘local’, ‘databricks’, ‘kubernetes’).
    • backend_config: Backend-specific configuration (e.g., Databricks cluster ID).
    • env_manager: How to manage the environment (e.g., ‘local’, ‘conda’, ‘virtualenv’).

Key Functions:

  • mlflow.sklearn.log_model(sk_model, artifact_path, conda_env=None, signature=None, input_example=None, registered_model_name=None): Logs a scikit-learn model. Similar functions exist for other frameworks (e.g., mlflow.tensorflow.log_model, mlflow.pytorch.log_model).
  • mlflow.sklearn.load_model(model_uri): Loads a scikit-learn model.
  • mlflow.register_model(model_uri, name): Registers a model in the MLflow Model Registry.
  • mlflow.pyfunc.log_model(python_model=None, artifact_path=None, conda_env=None, code_path=None, loader_module=None, signature=None, input_example=None, registered_model_name=None): logs a generic Python function as a model
  • mlflow.pyfunc.load_model(model_uri): Loads a generic Python function model.
  • mlflow.models.infer_signature(model_input, model_output=None, params=None): Infers the model signature, which describes the inputs and outputs of a model.

Parameters:

  • artifact_path: The path within the run to store the model.
  • conda_env: The path to a conda.yaml file specifying the environment. If None, MLflow will create a default environment.
  • signature: An mlflow.models.ModelSignature object describing the model’s inputs and outputs.
  • input_example: An example input to the model, used for signature inference and deployment.
  • registered_model_name: The name to register the model under in the Model Registry.

Key Functions:

  • mlflow.register_model(model_uri, name): Registers a model in the MLflow Model Registry.
  • mlflow.models.get_latest_versions(name, stages=None): Retrieves the latest versions of a registered model for specified stages.
  • mlflow.transition_model_version_stage(name, version, stage, archive_existing_versions=False): Transitions a model version to a new stage (e.g., ‘Staging’, ‘Production’).
  • mlflow.get_registry_uri(): Returns the URI of the model registry.

Stages:

  • None: Initial state of a model version.
  • Staging: Model version is being tested.
  • Production: Model version is deployed and serving live traffic.
  • Archived: Model version is no longer in use.
  • MLflow can deploy to various platforms, including local, cloud (AWS, Azure, GCP), Kubernetes, and more.
  • Use the mlflow models serve command to serve a model locally for testing.
  • MLflow provides tools and integrations for building Docker images and deploying to cloud platforms using their respective SDKs or CLIs.
  • MLflow also integrates with Kubernetes for deploying models as microservices.
import mlflow
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Set the experiment name (optional)
mlflow.set_experiment("Linear Regression Experiment")
with mlflow.start_run() as run:
# Log parameters
alpha = 0.1
mlflow.log_param("alpha", alpha)
# Generate some sample data
X = np.random.rand(100, 1)
y = 2 * X + 1 + 0.1 * np.random.randn(100, 1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions
predictions = model.predict(X_test)
# Calculate and log metrics
rmse = np.sqrt(mean_squared_error(y_test, predictions))
mlflow.log_metric("rmse", rmse)
# Log the model
mlflow.sklearn.log_model(model, "model")
# Log some sample data as an artifact
np.savetxt("sample_data.txt", np.concatenate((X_train, y_train), axis=1))
mlflow.log_artifact("sample_data.txt")
print(f"MLflow run completed with run_id {run.info.run_id}")

Expected Output:

The code will train a linear regression model, log parameters (alpha), a metric (RMSE), the model itself, and a data file as artifacts. You can then view the results in the MLflow UI. The print statement will output the ID of the MLflow run.

4.2. MLflow Project Example (with MLproject file)

Section titled “4.2. MLflow Project Example (with MLproject file)”

Create a file named MLproject:

name: My ML Project
conda_env: conda.yaml
entry_points:
main:
command: "python train.py --alpha {alpha}"
parameters:
alpha: {type: float, default: 0.1}

Create a file named train.py:

import mlflow
import argparse
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
def train(alpha):
with mlflow.start_run():
mlflow.log_param("alpha", alpha)
# Generate some sample data
X = np.random.rand(100, 1)
y = 2 * X + 1 + 0.1 * np.random.randn(100, 1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions
predictions = model.predict(X_test)
# Calculate and log metrics
rmse = np.sqrt(mean_squared_error(y_test, predictions))
mlflow.log_metric("rmse", rmse)
# Log the model
mlflow.sklearn.log_model(model, "model")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--alpha", type=float, default=0.1)
args = parser.parse_args()
train(args.alpha)

Create a conda.yaml file:

name: mlproject-env
channels:
- conda-forge
dependencies:
- python=3.8
- scikit-learn
- mlflow
- numpy

Run the project:

Terminal window
mlflow run . --param alpha=0.2

Expected Output:

This will run the train.py script as an MLflow project, logging the specified parameter (alpha=0.2) and the trained model. The MLproject file defines the project structure, dependencies, and entry point.

4.3. Registering and Transitioning a Model

Section titled “4.3. Registering and Transitioning a Model”
import mlflow
# Assume you have a trained model logged at a specific URI
model_uri = "runs:/<run_id>/model" # Replace <run_id> with the actual run ID
# Register the model
model_name = "MyLinearRegressionModel"
try:
result = mlflow.register_model(model_uri, model_name)
print(f"Successfully registered model '{model_name}' with version {result.version}")
except mlflow.exceptions.MlflowException as e:
if "already exists" in str(e):
print(f"Model '{model_name}' already exists. Skipping registration.")
else:
raise e
# Transition the model to the 'Staging' stage
client = mlflow.MlflowClient()
model_version = 1 # Replace with the actual model version
client.transition_model_version_stage(
name=model_name,
version=model_version,
stage="Staging",
archive_existing_versions=False, # Keep existing staging versions (optional)
)
print(f"Model version {model_version} transitioned to 'Staging' stage.")
# Load the model from the registry (Staging version)
loaded_model = mlflow.pyfunc.load_model(f"models:/{model_name}/Staging")
# Example usage with input data (replace with your actual data)
import pandas as pd
data = pd.DataFrame([[0.5]])
predictions = loaded_model.predict(data)
print(f"Predictions from Staging model: {predictions}")
# Transition to Production after testing
#client.transition_model_version_stage(
# name=model_name,
# version=model_version,
# stage="Production",
# archive_existing_versions=True, # Archive the previous production model
#)
#print(f"Model version {model_version} transitioned to 'Production' stage.")

Expected Output:

This code registers a model, transitions it to the ‘Staging’ stage, loads the model from the registry, and makes a prediction. The transition_model_version_stage function moves the model version to a new stage, enabling controlled deployment.

Terminal window
mlflow models serve --model-uri runs:/<run_id>/model --port 5000 --host 0.0.0.0

Replace <run_id> with the actual run ID. This command starts a local REST server that serves the model on port 5000. You can then send requests to the server to get predictions.

Example Request (using curl):

Terminal window
curl -X POST -H "Content-Type: application/json" -d '{"dataframe_records": [[0.5]]}' http://localhost:5000/invocations

Expected Output:

A JSON response containing the model’s prediction for the input data. The exact output will depend on your model.

import mlflow
import mlflow.pyfunc
import pandas as pd
import numpy as np
class MyCustomModel(mlflow.pyfunc.PythonModel):
def __init__(self, scaler):
self.scaler = scaler
def predict(self, context, model_input):
scaled_input = self.scaler.transform(model_input)
return np.sum(scaled_input, axis=1) # Example prediction logic
def load_context(self, context):
# this is where you can load dependencies if you need to
return
import pickle
from sklearn.preprocessing import StandardScaler
# Train a scaler
X = np.random.rand(100, 2)
scaler = StandardScaler()
scaler.fit(X)
# Save the scaler
with open("scaler.pkl", "wb") as f:
pickle.dump(scaler, f)
# Create a custom model instance
custom_model = MyCustomModel(scaler)
# Log the custom model
with mlflow.start_run() as run:
# Define the model signature
input_example = pd.DataFrame(np.random.rand(5, 2))
signature = mlflow.models.infer_signature(input_example)
# Define required packages
conda_env = {
"channels": ["conda-forge"],
"dependencies": [
"python=3.8",
"scikit-learn",
"pandas",
"numpy",
"pip",
{"pip": ["mlflow"]}
],
"name": "mlflow-env"
}
mlflow.pyfunc.log_model(
python_model=custom_model,
artifact_path="custom_model",
conda_env=conda_env,
signature=signature,
input_example=input_example,
code_path = ["./"] # required if the model class is in the same directory
)
model_uri = mlflow.get_artifact_uri("custom_model")
print(f"Logged custom model to: {model_uri}")
# Load the model
loaded_model = mlflow.pyfunc.load_model(model_uri)
# Make predictions
input_data = pd.DataFrame(np.random.rand(5, 2))
predictions = loaded_model.predict(input_data)
print(f"Predictions: {predictions}")

Explanation:

  • mlflow.pyfunc.PythonModel: Base class for custom Python models.
  • predict(context, model_input): The prediction function. context provides access to artifacts and dependencies.
  • load_context(self, context): Loads dependencies, if any.
  • conda_env: Defines the environment for the model.
  • code_path: Specifies any additional code files required for the model. Crucial when your model class is defined in a separate file.
  • artifact_path: the directory in the run to which this model is saved
  • input_example: Example input for the model to infer the signature.

5.2. Hyperparameter Tuning with Nested Runs

Section titled “5.2. Hyperparameter Tuning with Nested Runs”
import mlflow
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
def train_ridge(alpha):
with mlflow.start_run(nested=True) as run:
mlflow.log_param("alpha", alpha)
# Generate some sample data
X = np.random.rand(100, 1)
y = 2 * X + 1 + 0.1 * np.random.randn(100, 1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train a ridge regression model
model = Ridge(alpha=alpha)
model.fit(X_train, y_train)
# Make predictions
predictions = model.predict(X_test)
# Calculate and log metrics
rmse = np.sqrt(mean_squared_error(y_test, predictions))
mlflow.log_metric("rmse", rmse)
# Log the model
mlflow.sklearn.log_model(model, "model")
return rmse
with mlflow.start_run() as parent_run:
alphas = [0.01, 0.1, 1.0]
best_rmse = float('inf')
best_alpha = None
for alpha in alphas:
rmse = train_ridge(alpha)
if rmse < best_rmse:
best_rmse = rmse
best_alpha = alpha
mlflow.log_metric("best_rmse", best_rmse)
mlflow.log_param("best_alpha", best_alpha)
print(f"Best alpha: {best_alpha}, Best RMSE: {best_rmse}")

Explanation:

  • nested=True: Indicates that the run is nested under a parent run. Useful for hyperparameter search and other iterative processes.
  • The parent run logs the best hyperparameters and metrics from the nested runs.
import mlflow
import pandas as pd
from sklearn.linear_model import LinearRegression
from mlflow.models.signature import infer_signature
# Sample data
data = pd.DataFrame({'feature1': [1, 2, 3], 'feature2': [4, 5, 6], 'target': [7, 8, 9]})
X = data[['feature1', 'feature2']]
y = data['target']
# Train a model
model = LinearRegression()
model.fit(X, y)
# Infer the signature
signature = infer_signature(X, y)
# Log the model with the signature
with mlflow.start_run():
mlflow.sklearn.log_model(model, "model", signature=signature)
# Load the model and check the signature
loaded_model = mlflow.pyfunc.load_model("runs:/{run_id}/model".format(run_id=mlflow.active_run().info.run_id))
print(loaded_model.metadata.signature)

Explanation:

  • infer_signature(inputs, outputs=None, params=None): Infers the model signature from the input and output data.
  • Signatures define the expected input and output types and names, improving model validation and deployment.
  • Use mlflow.set_tag() for organizing runs: Tags can be used to categorize runs based on experiment type, data version, or other relevant information.
  • Use mlflow.log_artifacts() to log data: Log raw data, preprocessed data, or other relevant files as artifacts for reproducibility and debugging.
  • Use mlflow.get_run() to retrieve run information: Access parameters, metrics, tags, and artifacts associated with a specific run.
  • Use relative paths in mlflow.log_artifact() and mlflow.log_artifacts(): This makes the code more portable and independent of the current working directory.
  • Use the MLflow CLI for common tasks: The CLI provides convenient commands for running projects, serving models, and managing the Model Registry.
  • Ensure consistent environment management: Use Conda or virtualenv to create reproducible environments for your MLflow projects.
  • Leverage MLflow Recipes: MLflow Recipes offer a structured way to build production-ready ML pipelines with pre-built components and best practices.
  • Pandas: Pandas DataFrames are commonly used as input to MLflow models. The mlflow.models.infer_signature() function can automatically infer the signature from a DataFrame.
  • Matplotlib: Use mlflow.log_figure() to log Matplotlib plots as artifacts.
  • Scikit-learn: MLflow provides built-in support for logging and loading Scikit-learn models.
  • TensorFlow/Keras: MLflow provides built-in support for logging and loading TensorFlow and Keras models.
  • PyTorch: MLflow provides built-in support for logging and loading PyTorch models.
  • Spark: MLflow integrates with Spark for distributed training and deployment.