For people trying to become machine learning engineers from prior software engineering backgrounds. You should have already got extensive knowledge on Docker. You can skip this entire post, just read part 3 (how to deploy a ML model in docker)
For people trying to become machine learning engineers from prior data related backgrounds. You may have 0 knowledge on Docker, so I’d recommend reading this quite a few times. And watching the videos too.
With that said, let’s get to it.
Table of Contents:
Introduction to Docker
Docker objects
Deploy a ML model in docker
1 - Introduction to Docker
Docker has emerged as a pivotal tool in deploying machine learning models efficiently and consistently. This section will explore Docker's role in MLOps and guide you through setting it up for your ML projects.
1.1 Understanding Docker
Docker is an open-source platform that uses containerization to make it easier to create, deploy, and run applications. Containers allow a developer to package up an application with all the parts it needs, such as libraries and dependencies, and ship it all out as one package.
1.2 “It works on my machine” conundrum
One of the most common frustrations in software development is the "it works on my machine" problem. This is where an application runs flawlessly on the developer's machine but encounters issues in other environments. Docker mitigates this problem by ensuring that the application runs within a container with all its dependencies. This container can be moved between any system that has Docker installed, and the application will run without hitches.
FYI, the “It works on my machine” is a pretty funny meme.
1.3 Setting up Docker
Installing Docker:
Windows and Mac Users: Download Docker Desktop from the official Docker website. It provides a GUI and the Docker Engine.
Linux Users:
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
Installing Docker Engine:
Docker Engine is in the Docker Desktop installation for Windows and Mac. For Linux, the commands above will install Docker Engine as well.
Starting Docker Engine:
Windows/Mac: The Docker Desktop application needs to be running.
Linux:
sudo systemctl start docker
Using Docker Compose:
Docker Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services.
Installation: Docker Compose is included with Docker Desktop for Windows and Mac. For Linux, you can follow the official guideline for docker compose here.
2 - Docker objects
2.1 The basics
Docker is a collection of platform-as-a-service products. They use OS-level virtualization to deliver software in packages called containers. The essential Docker objects are images, containers, networks, volumes and plugins
Images are read-only templates with instructions for creating a Docker container.
Containers are runnable instances of an image.
Networks allow you to manage the communication between containers.
Volumes are the preferred mechanism for persisting data. Generated by and used by Docker containers.
Here is one of the first videos I ever watched on Docker. Highly recommend it.
2.2 Crafting a docker file
A Dockerfile
is a script composed of various commands and arguments listed successively to automatically perform actions on a base image in order to create (or form) a new one. It is the starting point for creating Docker images.
Here’s a simple Dockerfile
that defines an environment to run a Python "Hello World" application:
# Use an official Python runtime as a parent image
FROM python:3.8-slim
# Set the working directory in the container
WORKDIR /usr/src/app
# Copy the current directory contents into the container at /usr/src/app
COPY . .
# Run hello.py when the container launches
CMD ["python", "./hello.py"]
FROM:
it sets the base image for subsequent instructions
All valid docker files musts have a FROM instruction somewhere
CMD:
CMD tells it the commands to execute when a container is being made
COPY
It copies new files from a source (SRC), and then adds them to the file system of the contaienr at the path (DEST)
2.3 Build & run a docker image
Once you have a Dockerfile
, you can build an image and then run a container based on it:
here’s the generic code
docker build -t ImageName:TagName dir
docker run --name test -it ImageName:TagName
and here’s what you’ll want to run:
docker build -t my-python-app .
docker run -it --name running-app my-python-app
These commands create an image tagged as my-python-app
and then run a container named running-app
using this image.
here’s the breakdown:
-t: Image tag
ImageName: the name you want to give your image
TagName: the tag you want to give your image
dir: the directory where the Dockerfile is present (for current directory, you can use a dot .)
-it: it’s used to tell docker you want to run the container in interactive mode
--name: it’s used to give the name to the container
If you are more of a video person, you can watch this video which basically does the exact same thing.
2.4 Networking in Docker
Docker's networking capabilities allow containers to communicate with each other and the external world. Docker automatically creates a default network, but you can define custom networks as needed. Port mapping is an aspect of networking that allows you to map a port inside your container to a port on the host system, making your application accessible externally.
Port mapping:
Port mapping is crucial for web applications. For example, if your container runs a web server on port 5000, you can map it to port 80 on your host, allowing users to access it via the standard HTTP port:
docker run -p 80:5000 my-web-app
This command maps port 5000 inside the container to port 80 on the host machine.
2.5 Docker vs Virtual Machines
Virtual Machines (VMs) work by emulating virtual hardware. This means they’re less efficient than Docker containers. Containers directly use the host machine's kernel. VMs can take up several gigabytes, while containers can be lightweight (as small as tens of MBs for a simple app). This difference allows for a more efficient usage of resources.
2.6 Super simple docker commands
Here are some of the basic Docker commands that you'll use regularly and what they do:
docker pull [image name]
: Downloads an image from Docker Hub.docker build [path to Dockerfile]
: Builds a Docker image from a Dockerfile.docker images
: Lists all the Docker images on your system.docker run [image name]
: Creates and starts a container from an image.docker ps
: Lists running containers.docker stop [container name/id]
: Stops a running container.
3 - Deploy a ML model in docker
Deploying machine learning models with Docker encapsulates dependencies in a container, making the model portable and easy to deploy. Here’s how you can dockerize a simple machine learning model for production.
If you are more of a video person, you can watch this video here. It basically does the same thing. Just keep in mind we haven’t covered FastAPI, or Heroku yet.
3.1 Prepare your Python application
Let’s begin by creating a simple machine learning application with two Python files. The first will train a model, and the second will make predictions based on user input.
train.py
- Training the Model
import joblib
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
# Load data
iris = load_iris()
X, y = iris.data, iris.target
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Train the model
model = RandomForestClassifier()
model.fit(X_train, y_train)
# Save the model
joblib.dump(model, 'model.joblib')
predict.py
- Running Predictions
import joblib
# Load the trained model
model = joblib.load('model.joblib')
# Menu-driven prediction loop
while True:
user_input = input("Enter features separated by commas or 'exit' to quit: ")
if user_input.lower() == 'exit':
break
try:
features = list(map(float, user_input.split(',')))
prediction = model.predict([features])
print(f"The predicted class is: {prediction[0]}")
except Exception as e:
print(f"Error: {e}")
3.2 Defining dependencies in requirements.txt
The requirements.txt
file should list all the Python dependencies required by your application:
scikit-learn==0.24.1
joblib==1.0.1
3.3 Crafting a dockerfile
Now you will create a Dockerfile. It will outline how to build the Docker image for your application.
# Use an official Python runtime as a base image
FROM python:3.8-slim
# Set the working directory in the container
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Make port 80 available to the world outside this container
EXPOSE 80
# Define environment variable
ENV NAME World
# Run predict.py when the container launches
CMD ["python", "predict.py"]
3.4 Build the dockerimage
With the Dockerfile in place, build the image with the following command:
docker build -t my-ml-app .
3.5 Run the docker container
Once the image is built, you can run the container in interactive mode:
docker run -it --rm --name running-ml-app my-ml-app
3.6 A quick summary of what’s going on
Here’s a quick summary of how all of these components are connected:
train.py
is your model training script that processes data and creates a model file (model.joblib
).predict.py
is the script that interacts with the user, loads the model, and makes predictions based on user input.The
requirements.txt
file lists the dependencies necessary to run the Python scripts successfully.The
Dockerfile
contains instructions to Docker. It shows how to build the image. This includes setting up the environment and installing dependencies. it also specifies the command to run on container start,Building the Docker image packages your application and its environment into a container that can be run anywhere Docker is installed.
Running the container allows users to interact with your model through the prediction script.
This workflow encapsulates your machine learning model in a portable and reproducible environment. This make it simpler to share and deploy across different machines and platforms. It’s an integral part of MLOps that ensures models are easy to maintain, update, and scale.