Home Artificial Intelligence From Chaos to Consistency: Docker for Data Scientists Background What’s Docker? Docker Technical Features Installing Docker Deploying With Docker Example Summary & Further Thoughts References & Further Reading Connect With Me!

From Chaos to Consistency: Docker for Data Scientists Background What’s Docker? Docker Technical Features Installing Docker Deploying With Docker Example Summary & Further Thoughts References & Further Reading Connect With Me!

0
From Chaos to Consistency: Docker for Data Scientists
Background
What’s Docker?
Docker Technical Features
Installing Docker
Deploying With Docker Example
Summary & Further Thoughts
References & Further Reading
Connect With Me!

An introduction and application of Docker for Data Scientists

Towards Data Science
Photo by Ian Taylor on Unsplash

But it surely works on my machine?

This can be a classic meme within the tech community, especially for Data Scientists who need to ship their amazing machine-learning model, only to learn that the production machine has a special operating system. Removed from ideal.

Nevertheless…

There’s an answer because of these wonderful things called containers and tools to manage them reminiscent of Docker.

On this post, we’ll dive into what containers are and the way you may construct and run them using Docker. The usage of containers and Docker has develop into an industry standard and customary practice for data products. As a Data Scientist, learning these tools is then a useful tool in your arsenal.

Docker is a service that help construct, run and execute code and applications in containers.

Now you might be wondering, what’s a container?

Ostensibly, a container could be very just like a virtual machine (VM). It’s a small isolated environment where every thing is self ‘contained’ and will be run on any machine. The first selling point of containers and VMs is their portability, allowing your application or model to run seamlessly on any on-premise server, local machine, or on cloud platforms reminiscent of AWS.

The foremost difference between containers and VMs is how they use their hosts computer resources. Containers are rather a lot more lightweight as they don’t actively partition the hardware resources of the host machine. I won’t delve into the complete technical details here, nonetheless if you should understand a bit more, I actually have linked a fantastic article explaining their differences here.

Docker is then simply a tool we use to create, manage and run these containers with ease. It’s one among the foremost the reason why containers have develop into extremely popular, because it enables developers to simply deploy applications and models that run anywhere.

Diagram by writer.

There are three foremost elements we want to run a container using Docker:

  • Dockerfile: A text file that incorporates the instructions of learn how to construct a docker. image
  • Docker Image: A blueprint or template to create a Docker container.
  • Docker Container: An isolated environment that gives every thing an application or machine learning model must run. Includes things reminiscent of dependencies and OS versions.
Diagram by writer.

There are also just a few other key points to notice:

  • Docker Daemon: A background process (daemon) that deals with the incoming requests to docker.
  • Docker Client: A shell interface that allows the user to talk to Docker through its daemon.
  • DockerHub: Just like GitHun, a spot where developers can share their Docker images.

Hombrew

The very first thing it’s best to install is Homebrew (link here). That is dubbed because the ‘missing package manager for MacOS’ and could be very useful for anyone coding on their Mac.

To put in Homebrew, simply run the command given on their website:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Confirm Homebrew is installed by running brew help.

Docker

Now with Homebrew installed, you may install docker by running brew install docker. Confirm docker is installed by running which docker , the output shouldn’t rise any errors and appear to be this:

/opt/homebrew/bin/docker

Colima

The ultimate part, is it install Colima. Simply, run install colima and confirm it’s installed with which colima. Again, the output should appear to be this:

/opt/homebrew/bin/colima

Now you is perhaps wondering, what on earth is Colima?

Colima is a software package that allows container runtimes on MacOS. In additional laymen terms, Colima creates the environment for containers to work on our system. To realize this, it runs a Linux virtual machine with a daemon that Docker can communicate with using the client-server model.

Alternativetly, you too can install Docker desktop as a substitute of Colima. Nevertheless, I prefer Colima for just a few reasons: its free, more lightweight and I like working within the terminal!

See this blog post here for more arguments for Colima

Workflow

Below is an example of how Data Scientists and Machine Learning Engineers can deploy their model using Docker:

Diagram by writer.

Step one is clearly to construct their amazing model. Then, you could wrap up all of the stuff you might be using to run the model, stuff just like the python version and package dependencies. The ultimate step is to make use of that requirements file contained in the Dockerfile.

If this seems completely arbitrary to you in the mean time don’t fret, we’ll go over this process step-by-step!

Basic Model

Let’s start by constructing a basic model. The provided code snippet displays a straightforward implementation of the Random Forest classification model on the famous Iris dataset:

Dataset from Kaggle with a CC0 licence.

GitHub Gist by writer.

This file known as basic_rf_model.py for reference.

Create Requirements File

Now that we have now our model ready, we want to create a requirement.txt file to accommodate all of the dependencies that underpin the running of our model. In this easy example, we luckily only depend on the scikit-learn package. Due to this fact, our requirement.txt will simply appear to be this:

scikit-learn==1.2.2

You’ll be able to check the version you might be running in your computer by the scikit-learn --version command.

Create Dockerfile

Now we are able to finally create our Dockerfile!

So, in the identical directiory because the requirement.txt and basic_rf_model.py, create a file named Dockerfile. Inside Dockerfile we could have the next:

GitHub Gist by writer.

Let’s go over line by line to see what all of it means:

  • FROM python:3.9: That is the bottom image for our image
  • MAINTAINER egor@some.email.com: This means who maintains this image
  • WORKDIR /src: Sets the working directory of the image to be src
  • COPY . .: Copy the present directory files to the Docker directory
  • RUN pip install -r requirements.txt: Install the necessities from requirement.txt file into the Docker environment
  • CMD ["python", "basic_rf_model.py"]: Tells the container to execute the command python basic_rf_model.py and run the model

Initiate Colima & Docker

The following step is setup the Docker environment: First we want in addition up Colima:

colima start

After Colima has began up, check that the Docker commands are working by running:

docker ps

It should return something like this:

CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

This is sweet and means each Colima and Docker are working as expected!

Note: the docker ps command lists all the present running containers.

Construct Image

Now it’s time to construct our first Docker Image from the Dockerfile that we created above:

docker construct . -t docker_medium_example

The -t flag indicates the name of the image and the . tells us to construct from this current directory.

If we now run docker images, we must always see something like this:

Image from writer.

Congrats, the image has been built!

Run Container

After the image has been created, we are able to run it as a container using the IMAGE ID listed above:

docker run bb59f770eb07

Output:

Accuracy: 0.9736842105263158

Because all it has done is run the basic_rf_model.py script!

Extra Information

This tutorial is just scratching the surface of what Docker can do and be used for. There are a lot of more features and commands to learn to know Docker. I great detailed tutorial is given on the Docker website you could find here.

One cool feature is you could run the container in interactive mode and go into its shell. For instance, if we run:

docker run -it bb59f770eb07 /bin/bash

You’ll enter the Docker container and it should look something like this:

Image by writer.

We also used the ls command to point out all of the files within the Docker working directory.

Docker and containers are incredible tools to make sure Data Scientists’ models can run anywhere and anytime with no issues. They do that by creating small isolated compute environments that contain every thing for the model to run effectively. This known as a container. It is simple to make use of and light-weight, rendering it a typical industrial practice nowadays. In this text, we went over a basic example of how you may package your model right into a container using Docker. The method was easy and seamless, so is something Data Scientists can learn and pick up quickly.

Full code utilized in this text will be found at my GitHub here:

(All emojis designed by OpenMoji — the open-source emoji and icon project. License: CC BY-SA 4.0)

LEAVE A REPLY

Please enter your comment!
Please enter your name here