60_Mlflow_For_Mlops
Category: AI & Data Science Tools
Type: AI/ML Tool or Library
Generated on: 2025-08-26 11:11:16
For: Data Science, Machine Learning & Technical Interviews
MLflow Cheatsheet for MLOps
Section titled “MLflow Cheatsheet for MLOps”1. Tool/Library Overview
Section titled “1. Tool/Library Overview”MLflow is an open-source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and central model registry.
Main Use Cases:
- Tracking: Logging parameters, metrics, artifacts (models, data) during experiments.
- Projects: Packaging ML code in a reproducible format.
- Models: Managing and deploying models using various frameworks.
- Registry: Centralized model store for managing model versions, stages, and transitions.
- Deployment: Deploying models to various platforms (local, cloud, Kubernetes, etc.).
2. Installation & Setup
Section titled “2. Installation & Setup”Installation:
pip install mlflowSetup (Local Tracking Server):
mlflow uiThis starts a local MLflow tracking UI accessible in your browser (usually at http://localhost:5000).
Setup (Tracking Server with Database):
- Choose a database (e.g., PostgreSQL, MySQL).
- Create a database and user.
- Start MLflow with the database URI:
mlflow server \ --backend-store-uri postgresql://user:password@host:port/database \ --default-artifact-root s3://your-s3-bucket/mlflow-artifacts \ --host 0.0.0.0 # Optional: for remote accessEnvironment Variables (Authentication):
For cloud storage artifact repositories (e.g., S3, Azure Blob Storage, GCS), set the appropriate environment variables for authentication. Examples:
- AWS:
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY - Azure:
AZURE_STORAGE_ACCOUNT_NAME,AZURE_STORAGE_ACCOUNT_KEY - GCP:
GOOGLE_APPLICATION_CREDENTIALS
3. Core Features & API
Section titled “3. Core Features & API”3.1. Tracking
Section titled “3.1. Tracking”Key Functions:
mlflow.start_run(): Starts a new MLflow run. Returns aRunobject.mlflow.end_run(): Ends the current MLflow run. Automatically called when usingwith mlflow.start_run():mlflow.log_param(key, value): Logs a single parameter.mlflow.log_metric(key, value, step=None): Logs a single metric.stepis optional for tracking metrics over time.mlflow.log_artifact(local_path, artifact_path=None): Logs a file as an artifact.mlflow.log_artifacts(local_dir, artifact_path=None): Logs an entire directory as artifactsmlflow.set_tag(key, value): Sets a tag for the run.mlflow.set_experiment(experiment_name): Sets the active experiment. If the experiment does not exist, it will be created.mlflow.active_run(): Returns the currently active run object
Classes:
mlflow.Run: Represents an MLflow run. Accessible viamlflow.active_run().
Parameters:
mlflow.start_run(run_name=None, experiment_id=None, nested=False):run_name: A human-readable name for the run.experiment_id: The ID of the experiment to associate the run with.nested: Whether the run is a nested run (e.g., for hyperparameter search).
3.2. Projects
Section titled “3.2. Projects”Key Functions:
mlflow.projects.run(uri, entry_point='main', version=None, parameters=None, backend='local', backend_config=None, env_manager='local'): Runs an MLflow project.uri: The URI of the project (local path, Git repository, etc.).entry_point: The entry point to execute (e.g., ‘main’).parameters: A dictionary of parameters to pass to the entry point.backend: The backend to use (e.g., ‘local’, ‘databricks’, ‘kubernetes’).backend_config: Backend-specific configuration (e.g., Databricks cluster ID).env_manager: How to manage the environment (e.g., ‘local’, ‘conda’, ‘virtualenv’).
3.3. Models
Section titled “3.3. Models”Key Functions:
mlflow.sklearn.log_model(sk_model, artifact_path, conda_env=None, signature=None, input_example=None, registered_model_name=None): Logs a scikit-learn model. Similar functions exist for other frameworks (e.g.,mlflow.tensorflow.log_model,mlflow.pytorch.log_model).mlflow.sklearn.load_model(model_uri): Loads a scikit-learn model.mlflow.register_model(model_uri, name): Registers a model in the MLflow Model Registry.mlflow.pyfunc.log_model(python_model=None, artifact_path=None, conda_env=None, code_path=None, loader_module=None, signature=None, input_example=None, registered_model_name=None): logs a generic Python function as a modelmlflow.pyfunc.load_model(model_uri): Loads a generic Python function model.mlflow.models.infer_signature(model_input, model_output=None, params=None): Infers the model signature, which describes the inputs and outputs of a model.
Parameters:
artifact_path: The path within the run to store the model.conda_env: The path to aconda.yamlfile specifying the environment. IfNone, MLflow will create a default environment.signature: Anmlflow.models.ModelSignatureobject describing the model’s inputs and outputs.input_example: An example input to the model, used for signature inference and deployment.registered_model_name: The name to register the model under in the Model Registry.
3.4. Registry
Section titled “3.4. Registry”Key Functions:
mlflow.register_model(model_uri, name): Registers a model in the MLflow Model Registry.mlflow.models.get_latest_versions(name, stages=None): Retrieves the latest versions of a registered model for specified stages.mlflow.transition_model_version_stage(name, version, stage, archive_existing_versions=False): Transitions a model version to a new stage (e.g., ‘Staging’, ‘Production’).mlflow.get_registry_uri(): Returns the URI of the model registry.
Stages:
None: Initial state of a model version.Staging: Model version is being tested.Production: Model version is deployed and serving live traffic.Archived: Model version is no longer in use.
3.5 Deployment
Section titled “3.5 Deployment”- MLflow can deploy to various platforms, including local, cloud (AWS, Azure, GCP), Kubernetes, and more.
- Use the
mlflow models servecommand to serve a model locally for testing. - MLflow provides tools and integrations for building Docker images and deploying to cloud platforms using their respective SDKs or CLIs.
- MLflow also integrates with Kubernetes for deploying models as microservices.
4. Practical Examples
Section titled “4. Practical Examples”4.1. Basic Tracking Example
Section titled “4.1. Basic Tracking Example”import mlflowimport numpy as npfrom sklearn.linear_model import LinearRegressionfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import mean_squared_error
# Set the experiment name (optional)mlflow.set_experiment("Linear Regression Experiment")
with mlflow.start_run() as run: # Log parameters alpha = 0.1 mlflow.log_param("alpha", alpha)
# Generate some sample data X = np.random.rand(100, 1) y = 2 * X + 1 + 0.1 * np.random.randn(100, 1) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train a linear regression model model = LinearRegression() model.fit(X_train, y_train)
# Make predictions predictions = model.predict(X_test)
# Calculate and log metrics rmse = np.sqrt(mean_squared_error(y_test, predictions)) mlflow.log_metric("rmse", rmse)
# Log the model mlflow.sklearn.log_model(model, "model")
# Log some sample data as an artifact np.savetxt("sample_data.txt", np.concatenate((X_train, y_train), axis=1)) mlflow.log_artifact("sample_data.txt")
print(f"MLflow run completed with run_id {run.info.run_id}")Expected Output:
The code will train a linear regression model, log parameters (alpha), a metric (RMSE), the model itself, and a data file as artifacts. You can then view the results in the MLflow UI. The print statement will output the ID of the MLflow run.
4.2. MLflow Project Example (with MLproject file)
Section titled “4.2. MLflow Project Example (with MLproject file)”Create a file named MLproject:
name: My ML Projectconda_env: conda.yamlentry_points: main: command: "python train.py --alpha {alpha}" parameters: alpha: {type: float, default: 0.1}Create a file named train.py:
import mlflowimport argparseimport numpy as npfrom sklearn.linear_model import LinearRegressionfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import mean_squared_error
def train(alpha): with mlflow.start_run(): mlflow.log_param("alpha", alpha)
# Generate some sample data X = np.random.rand(100, 1) y = 2 * X + 1 + 0.1 * np.random.randn(100, 1) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train a linear regression model model = LinearRegression() model.fit(X_train, y_train)
# Make predictions predictions = model.predict(X_test)
# Calculate and log metrics rmse = np.sqrt(mean_squared_error(y_test, predictions)) mlflow.log_metric("rmse", rmse)
# Log the model mlflow.sklearn.log_model(model, "model")
if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--alpha", type=float, default=0.1) args = parser.parse_args() train(args.alpha)Create a conda.yaml file:
name: mlproject-envchannels: - conda-forgedependencies: - python=3.8 - scikit-learn - mlflow - numpyRun the project:
mlflow run . --param alpha=0.2Expected Output:
This will run the train.py script as an MLflow project, logging the specified parameter (alpha=0.2) and the trained model. The MLproject file defines the project structure, dependencies, and entry point.
4.3. Registering and Transitioning a Model
Section titled “4.3. Registering and Transitioning a Model”import mlflow
# Assume you have a trained model logged at a specific URImodel_uri = "runs:/<run_id>/model" # Replace <run_id> with the actual run ID
# Register the modelmodel_name = "MyLinearRegressionModel"try: result = mlflow.register_model(model_uri, model_name) print(f"Successfully registered model '{model_name}' with version {result.version}")except mlflow.exceptions.MlflowException as e: if "already exists" in str(e): print(f"Model '{model_name}' already exists. Skipping registration.") else: raise e
# Transition the model to the 'Staging' stageclient = mlflow.MlflowClient()model_version = 1 # Replace with the actual model versionclient.transition_model_version_stage( name=model_name, version=model_version, stage="Staging", archive_existing_versions=False, # Keep existing staging versions (optional))
print(f"Model version {model_version} transitioned to 'Staging' stage.")
# Load the model from the registry (Staging version)loaded_model = mlflow.pyfunc.load_model(f"models:/{model_name}/Staging")
# Example usage with input data (replace with your actual data)import pandas as pddata = pd.DataFrame([[0.5]])predictions = loaded_model.predict(data)print(f"Predictions from Staging model: {predictions}")
# Transition to Production after testing#client.transition_model_version_stage(# name=model_name,# version=model_version,# stage="Production",# archive_existing_versions=True, # Archive the previous production model#)#print(f"Model version {model_version} transitioned to 'Production' stage.")Expected Output:
This code registers a model, transitions it to the ‘Staging’ stage, loads the model from the registry, and makes a prediction. The transition_model_version_stage function moves the model version to a new stage, enabling controlled deployment.
4.4. Deploying a Model Locally
Section titled “4.4. Deploying a Model Locally”mlflow models serve --model-uri runs:/<run_id>/model --port 5000 --host 0.0.0.0Replace <run_id> with the actual run ID. This command starts a local REST server that serves the model on port 5000. You can then send requests to the server to get predictions.
Example Request (using curl):
curl -X POST -H "Content-Type: application/json" -d '{"dataframe_records": [[0.5]]}' http://localhost:5000/invocationsExpected Output:
A JSON response containing the model’s prediction for the input data. The exact output will depend on your model.
5. Advanced Usage
Section titled “5. Advanced Usage”5.1. Custom Python Model (PyFunc)
Section titled “5.1. Custom Python Model (PyFunc)”import mlflowimport mlflow.pyfuncimport pandas as pdimport numpy as np
class MyCustomModel(mlflow.pyfunc.PythonModel): def __init__(self, scaler): self.scaler = scaler
def predict(self, context, model_input): scaled_input = self.scaler.transform(model_input) return np.sum(scaled_input, axis=1) # Example prediction logic
def load_context(self, context): # this is where you can load dependencies if you need to return
import picklefrom sklearn.preprocessing import StandardScaler
# Train a scalerX = np.random.rand(100, 2)scaler = StandardScaler()scaler.fit(X)
# Save the scalerwith open("scaler.pkl", "wb") as f: pickle.dump(scaler, f)
# Create a custom model instancecustom_model = MyCustomModel(scaler)
# Log the custom modelwith mlflow.start_run() as run:
# Define the model signature input_example = pd.DataFrame(np.random.rand(5, 2)) signature = mlflow.models.infer_signature(input_example)
# Define required packages conda_env = { "channels": ["conda-forge"], "dependencies": [ "python=3.8", "scikit-learn", "pandas", "numpy", "pip", {"pip": ["mlflow"]} ], "name": "mlflow-env" }
mlflow.pyfunc.log_model( python_model=custom_model, artifact_path="custom_model", conda_env=conda_env, signature=signature, input_example=input_example, code_path = ["./"] # required if the model class is in the same directory )
model_uri = mlflow.get_artifact_uri("custom_model") print(f"Logged custom model to: {model_uri}")
# Load the model loaded_model = mlflow.pyfunc.load_model(model_uri)
# Make predictions input_data = pd.DataFrame(np.random.rand(5, 2)) predictions = loaded_model.predict(input_data) print(f"Predictions: {predictions}")Explanation:
mlflow.pyfunc.PythonModel: Base class for custom Python models.predict(context, model_input): The prediction function.contextprovides access to artifacts and dependencies.load_context(self, context): Loads dependencies, if any.conda_env: Defines the environment for the model.code_path: Specifies any additional code files required for the model. Crucial when your model class is defined in a separate file.artifact_path: the directory in the run to which this model is savedinput_example: Example input for the model to infer the signature.
5.2. Hyperparameter Tuning with Nested Runs
Section titled “5.2. Hyperparameter Tuning with Nested Runs”import mlflowimport numpy as npfrom sklearn.linear_model import Ridgefrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import mean_squared_error
def train_ridge(alpha): with mlflow.start_run(nested=True) as run: mlflow.log_param("alpha", alpha) # Generate some sample data X = np.random.rand(100, 1) y = 2 * X + 1 + 0.1 * np.random.randn(100, 1) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train a ridge regression model model = Ridge(alpha=alpha) model.fit(X_train, y_train)
# Make predictions predictions = model.predict(X_test)
# Calculate and log metrics rmse = np.sqrt(mean_squared_error(y_test, predictions)) mlflow.log_metric("rmse", rmse)
# Log the model mlflow.sklearn.log_model(model, "model") return rmse
with mlflow.start_run() as parent_run: alphas = [0.01, 0.1, 1.0] best_rmse = float('inf') best_alpha = None
for alpha in alphas: rmse = train_ridge(alpha) if rmse < best_rmse: best_rmse = rmse best_alpha = alpha
mlflow.log_metric("best_rmse", best_rmse) mlflow.log_param("best_alpha", best_alpha) print(f"Best alpha: {best_alpha}, Best RMSE: {best_rmse}")Explanation:
nested=True: Indicates that the run is nested under a parent run. Useful for hyperparameter search and other iterative processes.- The parent run logs the best hyperparameters and metrics from the nested runs.
5.3. Using Model Signatures
Section titled “5.3. Using Model Signatures”import mlflowimport pandas as pdfrom sklearn.linear_model import LinearRegressionfrom mlflow.models.signature import infer_signature
# Sample datadata = pd.DataFrame({'feature1': [1, 2, 3], 'feature2': [4, 5, 6], 'target': [7, 8, 9]})X = data[['feature1', 'feature2']]y = data['target']
# Train a modelmodel = LinearRegression()model.fit(X, y)
# Infer the signaturesignature = infer_signature(X, y)
# Log the model with the signaturewith mlflow.start_run(): mlflow.sklearn.log_model(model, "model", signature=signature)
# Load the model and check the signatureloaded_model = mlflow.pyfunc.load_model("runs:/{run_id}/model".format(run_id=mlflow.active_run().info.run_id))print(loaded_model.metadata.signature)Explanation:
infer_signature(inputs, outputs=None, params=None): Infers the model signature from the input and output data.- Signatures define the expected input and output types and names, improving model validation and deployment.
6. Tips & Tricks
Section titled “6. Tips & Tricks”- Use
mlflow.set_tag()for organizing runs: Tags can be used to categorize runs based on experiment type, data version, or other relevant information. - Use
mlflow.log_artifacts()to log data: Log raw data, preprocessed data, or other relevant files as artifacts for reproducibility and debugging. - Use
mlflow.get_run()to retrieve run information: Access parameters, metrics, tags, and artifacts associated with a specific run. - Use relative paths in
mlflow.log_artifact()andmlflow.log_artifacts(): This makes the code more portable and independent of the current working directory. - Use the MLflow CLI for common tasks: The CLI provides convenient commands for running projects, serving models, and managing the Model Registry.
- Ensure consistent environment management: Use Conda or virtualenv to create reproducible environments for your MLflow projects.
- Leverage MLflow Recipes: MLflow Recipes offer a structured way to build production-ready ML pipelines with pre-built components and best practices.
7. Integration
Section titled “7. Integration”- Pandas: Pandas DataFrames are commonly used as input to MLflow models. The
mlflow.models.infer_signature()function can automatically infer the signature from a DataFrame. - Matplotlib: Use
mlflow.log_figure()to log Matplotlib plots as artifacts. - Scikit-learn: MLflow provides built-in support for logging and loading Scikit-learn models.
- TensorFlow/Keras: MLflow provides built-in support for logging and loading TensorFlow and Keras models.
- PyTorch: MLflow provides built-in support for logging and loading PyTorch models.
- Spark: MLflow integrates with Spark for distributed training and deployment.
8. Further Resources
Section titled “8. Further Resources”- Official MLflow Documentation: https://www.mlflow.org/docs/latest/index.html
- MLflow Examples: https://github.com/mlflow/mlflow/tree/master/examples
- MLflow Recipes: https://mlflow.org/docs/latest/recipes/index.html
- MLflow Blog: https://www.databricks.com/blog/tag/mlflow
- Community Slack: https://mlflow.org/community/