Loading…

Sam Kenkel

Data Science, Machine Learning, DevOps, CCNA, ACSR
Learn More

Setting up an Nvidia-Docker workstation for DataScience/DeepLearning

After deciding, in my previous post, to switch my z620 to an Nvidia-Docker workstation, I wanted to give a writeup of how exactly I did that, because some of the specific technical steps (such as disabling a graphics card in bios to install the nvidia driver) aren’t all documented in one place.

Part 1: HW Setup

First I open up my z620, and remove the quad port NICs that I’m no longer going to use.  The z620 has two ‘compartments’ inside of the case: the pci-express ports sit on one side of a partition, and both of the processors sit on the other side. This is a very well designed case: it segments the heats from the CPUs away from the GPU and works nicely with blower cards. 

The partition on the left of the image is actually  part of the removable 2nd cpu tray.  You can also notice on the 2nd pci-express x16 port, there is still a card plugged in. That is an amd 5450. It is very difficult to make this work without a 2nd GPU for setup. In my experience the amd 5450 is like a quad port pro 1000 nic: Reliable standards that just work with many different Operating System’s default drivers.

I removed the 2nd CPU (Push on the green buttons, then pull out the entire CPU and RAM tray) because it is much easier to install the GPU with the 2nd CPU removed.

I have a blower card. The larger “Blower debate” about GPUs is worth exploring in more detail at the design stage, but the TLDR is: if you have a workstation style case with limited airflow, or if you’re going to have components close to each other (such as install two GPUs side by side) a blower card works better. A non blower card will give you lower temps if you are working with a gaming case with good ventilation. As a note, if you are working with a z420, z620 or t3600, check GPU height. Workstation cases are often designed with very specific heights in mind, and modern non-blower GPUs will often be too tall to close the case without modification.

To power the gpu you take the two 6 pins that by default live attached to the plastic fan housing on the front bottom of the z620 case.The 1080ti requires one 6-pin and one 8-pin for power, so a 6 to 8 pin adapter is required. This is ONLY SAFE with specific research. The z620 uses a non standard (much better than  6pin minimum specification) 6 pin so you can ‘safely’ use a 6 to 8 pin that has three ground wires. This part of the build is the biggest reason I don’t recommend recreating this exactly: A 1060 6gb can be powered by the 6 pin pin of a z420 or t3600, or a custom build can use a standard PSU that has an actual 8 pin.

Once I’ve connected the GPU power,  I swapped in some new hard drives ( so I could preserve my proxmox config if I want to revert later). I used a 120GB SSD for the OS, and a 2TB ‘Slow’ drive to store larger datasets.

Part 2: Ubuntu/Nvidia Setup 

In bios I set the 5450 to be the active card.

I install ubuntu server.  The only  optional package selected during the was openssh. If you are recreating this, you may want to install ubuntu 17.04 client (If you install server, you are only going to be administering the system through the command line)

In bios, I must  disable the 1080ti pcie slot. If I don’t Ubuntu will crash on boot.  (The 1080ti’s non proprietary driver is the culprit for this issue)

Boot Ubuntu. 

At a command prompt, I add the proprietary graphics drivers to the system (this is where the Nvidia drivers live).

sudo add-apt-repository ppa:graphics-drivers/ppa    

Update.

sudo apt-get update

Install the latest Nvidia drivers.

sudo apt-get install nvidia-384

reboot (disabling pci slot with 5450, enabling pcie slot 1080ti in bios. Once I do this, I need to switch which GPU the HDMI cable I’m using for monitoring is attached to, or I could ssh in after reboot.)

Next I update the system in general:

sudo apt-get update

sudo apt-get dist-upgrade

run nvidia-smi

This is what I  see:

This means that the Nvidia Driver/Cuda is working correctly.

Reboot. At this point I can remove the 5450. While the z620 doesn’t have the power to support another full GPU in the second pci-e x16 slot, the system doesn’t have a 10gb nic (10gb ethernet or infiniband), and when I upgrade my home network to 10gb, that’s what will go in that slot. 

Part 3: Docker/Nvidia-Docker Setup

This is pretty much the standard docker install steps: https://docs.docker.com/engine/installation/linux/docker-ce/ubuntu/#install-docker-ce

First I ensure that I have the requirements for docker.

1)sudo apt-get install \

apt-transport-https \

ca-certificates \

curl \

software-properties-common

I get Docker’s pgp key.

2)curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add –

Now Docker is added to the apt sources list.

3)sudo add-apt-repository \

“deb [arch=amd64] https://download.docker.com/linux/ubuntu \

$(lsb_release -cs) \

stable”

I need to  download a custom version of the Nvidia-Docker package that works with ubuntu 17.04 The official image from  will fail. (https://github.com/NVIDIA/nvidia-docker/issues/234 explains why, in short it’s what I get for my OS selection) . 

4) wget https://github.com/NVIDIA/nvidia-docker/files/818401/nvidia-docker_1.0.1-yakkety_amd64.deb.zip

Another update to get the docker packages that I added to  apt sources in 3.

5)sudo apt-get update

Time to actually install docker:

6)sudo apt-get install docker-ce

Next I install a required Nvidia utility (without this  nvidia docker cannot function.)

7) sudo apt-get intsall nvidia-modprobe

Now I install Nvidia-Docker

8)sudo dpkg -i  the custom file from step 4.  

Now Test:

9) sudo nvidia-docker run –rm nvidia/cuda nvidia-smi

This is what I see:

If you’ve been using this for your own setup and you see this, then you have a docker image that has access to your gpu! Congrats!

If you want to actually test with some code:

10) Make a folder for some test notebooks (I used /home/USERNAME/Data)

11)sudo nvidia-docker run -i -t -v  /home/USERNAME/Data:/opt/data -p 8888:8888 tensorflow/tensorflow:nightly-gpu-py3 jupyter notebook  –notebook-dir=/opt/data –ip=’*’ –port=8888 –no-browser –allow-root

That give you a jupyter notebook on port 8888, it launches the notebook looking at the directory you created in step 10.

Log into it, and you can use the ! to test further:

This system won’t have Keras  installed (so if you prefer Keras main to tf.contrib.keras you’ll need to install it)

And now you can import Keras

It’s worth mentioning here that it’s not best practice to update a docker while it’s running, because each docker container is more like an instance in programming terms than a traditional VM. Every time I run the command “nvidia-docker run -i -t tensorflow/tensorflow:nightly-gpu-py3 ” I’m creating a new container from the template. The reason why I mount the /home/USERNAME/Data to opt/Data is that every other directory on the container only persists in that container. I leave my jupyter notebooks and my data in that directory, but no other changes to my container are meant to ‘persist’.  (This is a topic which I will cover in a later blog post). This gets into reproducibility: I can send anyone my docker image, my dataset and a jupter notebook, and they can reproduce what I’ve done.

I know this was a little long, but I hope it was useful to anyone considering building an Nvidia-Docker homelab for DeepLearning.

You can contact me at my Contact Page  if you have any questions, or if you want to talk about anything involving the intersection of Data Science, Machine  Learning, and DevOps.


Also published on Medium.