MLOps: Operationalizing Machine Learning at Scale

Share:

Most machine learning projects never make it to production. Not because the models are bad, but because writing a model in a Jupyter notebook is only a tiny fraction of the challenge. The real work is building a stable, repeatable platform that can train, deploy, and monitor that model when the underlying data is constantly shifting.

In traditional software development, we compile code once, and if the environment does not change, the application behaves predictably. In machine learning, your code might not change at all, but if your incoming user data shifts, your model output degrades. This is what we call model drift, and it is why traditional DevOps pipelines are not quite enough.


Where traditional DevOps falls short

DevOps gave us continuous integration and continuous delivery. We commit code, run automated tests, build a container image, and deploy it.

When we transition to machine learning, we have three distinct axes of change:

  1. Code: The training scripts, data preparation pipelines, and serving code.
  2. Data: The historical features used for training and the live inference inputs.
  3. Model: The resulting binary weights generated by the training run.

If you only version control your code, you cannot reproduce your model. You need to know the exact dataset version and hyperparameter settings used to train that specific binary.

Furthermore, machine learning workloads are computationally heavy and highly stateful. A training run might pull terabytes of data, require multiple GPUs for days, and produce a multi gigabyte artifact. You cannot run that inside a standard GitHub Actions runner.

To solve this, the community has settled on two primary open source tools, each addressing a different level of the stack: MLflow for experiment tracking and model registration, and Kubeflow for running containerized training and pipelines at scale.


Experiment tracking with MLflow

When data scientists are iterating on a model, they run hundreds of experiments with different parameters, feature sets, and architectures. Without a central tool, this usually ends up documented in scattered spreadsheets or local notebooks.

This is where MLflow shines. It is lightweight, database backed, and fits into existing Python code with a few lines of instrumentation.

Tracking and reproducibility

MLflow Tracking provides a central dashboard where every training run is logged. We record:

  • Hyperparameters (learning rates, batch sizes, optimizer choices)
  • Metrics (validation loss, precision, recall, F1 score)
  • Artifacts (loss curves, confusion matrices, and the serialized model weights)

Instead of guessing which training run produced a specific output, you can query the MLflow API to retrieve the exact Git commit, parameters, and training dataset.

The Model Registry

Once a model is trained and validated, it needs to move from an experiment artifact to a managed release. The MLflow Model Registry provides version control for the binary files. It supports:

  • Centralized discovery across different teams
  • Stage promotions (such as moving from Staging to Production)
  • Lineage tracing back to the exact run that produced the model

MLflow is great at the metadata layer. However, it is not a container orchestrator. It does not know how to distribute a PyTorch training run across forty GPUs or manage spot instances. For that, we need to go deeper into the infrastructure layer.


Scaling on Kubernetes with Kubeflow

Kubeflow is not a single tool; it is a collection of Kubernetes native operators designed to handle the heavy compute requirements of machine learning. It leverages Kubernetes custom resources to schedule and manage workloads.

Kubeflow Pipelines

If you write a monolithic script that downloads data, cleans it, trains a model, and saves the weights, you are building a fragile system. If the script fails at the training step, you have to restart the data download from scratch.

Kubeflow Pipelines solves this by breaking the workflow into isolated, containerized steps. Each step takes input from the previous step and saves its output to object storage. If the training step fails, you can fix the code and resume the pipeline exactly where it stopped, reusing the cached outputs from the data preparation steps.

Distributed training

Training modern deep learning models or fine tuning large language models requires massive parallelization.

Kubeflow Trainer simplifies this by managing distributed training jobs using operators for frameworks like PyTorch, TensorFlow, and JAX. Instead of manually configuring network sockets and environment variables across multiple worker pods, you define a custom resource:

apiVersion: kubeflow.org/v1
kind: PyTorchJob
metadata:
  name: llama-finetune
spec:
  pytorchReplicaSpecs:
    Master:
      replicas: 1
      template:
        spec:
          containers:
            - name: pytorch
              image: registry/llamatrain:v1
    Worker:
      replicas: 8
      template:
        spec:
          containers:
            - name: pytorch
              image: registry/llamatrain:v1
              resources:
                limits:
                  nvidia.com/gpu: 1

Kubeflow handles the pod orchestration, coordinates the distributed communication, and tears down the workers when the job finishes.

Production serving with KServe

Serving a model in production is different from serving a traditional web API. You need to handle request batching (combining multiple single inference requests to process them efficiently on a GPU), automatic scaling based on latency, and canary deployments.

KServe provides serverless model serving on top of Knative and Istio. It abstracts away the complex scaling logic, allowing you to route a small percentage of traffic to a new model version and monitor latency before completing the rollout.


Building a pragmatic MLOps stack

Many organizations make the mistake of deploying the entire Kubeflow suite on day one. They end up overwhelmed by the operational complexity of managing Istio, Knative, Dex, and multiple custom controllers.

My advice is to start small:

  1. Keep storage simple: Use an S3 or Google Cloud Storage bucket for artifact storage and MinIO for local development.
  2. Use MLflow first: It has a low barrier to entry and solves the immediate problem of experiment tracking and model versioning.
  3. Adopt Kubeflow selectively: Start with Kubeflow Pipelines to automate your training runs. Only deploy the Trainer or KServe when you genuinely need distributed GPU training or advanced serverless scaling.

At the end of the day, MLOps is not about buying a fancy platform. It is about applying basic engineering hygiene to data and models: automate the boring steps, version control your state, and monitor your outputs.


Recommended reading

  • Google Cloud Architecture Guide: MLOps: Continuous delivery and automation pipelines in machine learning. A great conceptual overview of automation maturity levels.
  • MLflow Documentation: mlflow.org. The official guide to tracking, registries, and recipes.
  • Kubeflow Introduction: kubeflow.org/docs. The starting point for deploying operators on Kubernetes.