After deciding, in my previous post, to switch my z620 to an Nvidia-Docker workstation, I wanted to give a writeup of how exactly I did that, because some of the specific technical steps (such as disabling a graphics card in bios to install the nvidia driver) aren’t all documented in one place.
Part 1: HW Setup
First I open up my z620, and remove the quad port NICs that I’m no longer going to use. The z620 has two ‘compartments’ inside of the case: the pci-express ports sit on one side of a partition, and both of the processors sit on the other side. This is a very well designed case: it segments the heats from the CPUs away from the GPU and works nicely with blower cards.
The partition on the left of the image is actually part of the removable 2nd cpu tray. You can also notice on the 2nd pci-express x16 port, there is still a card plugged in. That is an amd 5450. It is very difficult to make this work without a 2nd GPU for setup. In my experience the amd 5450 is like a quad port pro 1000 nic: Reliable standards that just work with many different Operating System’s default drivers.
I removed the 2nd CPU (Push on the green buttons, then pull out the entire CPU and RAM tray) because it is much easier to install the GPU with the 2nd CPU removed.
I have a blower card. The larger “Blower debate” about GPUs is worth exploring in more detail at the design stage, but the TLDR is: if you have a workstation style case with limited airflow, or if you’re going to have components close to each other (such as install two GPUs side by side) a blower card works better. A non blower card will give you lower temps if you are working with a gaming case with good ventilation. As a note, if you are working with a z420, z620 or t3600, check GPU height. Workstation cases are often designed with very specific heights in mind, and modern non-blower GPUs will often be too tall to close the case without modification.
To power the gpu you take the two 6 pins that by default live attached to the plastic fan housing on the front bottom of the z620 case.The 1080ti requires one 6-pin and one 8-pin for power, so a 6 to 8 pin adapter is required. This is ONLY SAFE with specific research. The z620 uses a non standard (much better than 6pin minimum specification) 6 pin so you can ‘safely’ use a 6 to 8 pin that has three ground wires. This part of the build is the biggest reason I don’t recommend recreating this exactly: A 1060 6gb can be powered by the 6 pin pin of a z420 or t3600, or a custom build can use a standard PSU that has an actual 8 pin.
Once I’ve connected the GPU power, I swapped in some new hard drives ( so I could preserve my proxmox config if I want to revert later). I used a 120GB SSD for the OS, and a 2TB ‘Slow’ drive to store larger datasets.
Part 2: Ubuntu/Nvidia Setup
In bios I set the 5450 to be the active card.
I install ubuntu server. The only optional package selected during the was openssh. If you are recreating this, you may want to install ubuntu 17.04 client (If you install server, you are only going to be administering the system through the command line)
In bios, I must disable the 1080ti pcie slot. If I don’t Ubuntu will crash on boot. (The 1080ti’s non proprietary driver is the culprit for this issue)
At a command prompt, I add the proprietary graphics drivers to the system (this is where the Nvidia drivers live).
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
Install the latest Nvidia drivers.
sudo apt-get install nvidia-384
reboot (disabling pci slot with 5450, enabling pcie slot 1080ti in bios. Once I do this, I need to switch which GPU the HDMI cable I’m using for monitoring is attached to, or I could ssh in after reboot.)
Next I update the system in general:
sudo apt-get update
sudo apt-get dist-upgrade
This is what I see:
This means that the Nvidia Driver/Cuda is working correctly.
Reboot. At this point I can remove the 5450. While the z620 doesn’t have the power to support another full GPU in the second pci-e x16 slot, the system doesn’t have a 10gb nic (10gb ethernet or infiniband), and when I upgrade my home network to 10gb, that’s what will go in that slot.
Part 3: Docker/Nvidia-Docker Setup
This is pretty much the standard docker install steps: https://docs.docker.com/engine/installation/linux/docker-ce/ubuntu/#install-docker-ce
First I ensure that I have the requirements for docker.
1)sudo apt-get install \
I get Docker’s pgp key.
2)curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add –
Now Docker is added to the apt sources list.
3)sudo add-apt-repository \
“deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
I need to download a custom version of the Nvidia-Docker package that works with ubuntu 17.04 The official image from will fail. (https://github.com/NVIDIA/nvidia-docker/issues/234 explains why, in short it’s what I get for my OS selection) .
4) wget https://github.com/NVIDIA/nvidia-docker/files/818401/nvidia-docker_1.0.1-yakkety_amd64.deb.zip
Another update to get the docker packages that I added to apt sources in 3.
5)sudo apt-get update
Time to actually install docker:
6)sudo apt-get install docker-ce
Next I install a required Nvidia utility (without this nvidia docker cannot function.)
7) sudo apt-get intsall nvidia-modprobe
Now I install Nvidia-Docker
8)sudo dpkg -i the custom file from step 4.
9) sudo nvidia-docker run –rm nvidia/cuda nvidia-smi
This is what I see:
If you’ve been using this for your own setup and you see this, then you have a docker image that has access to your gpu! Congrats!
If you want to actually test with some code:
10) Make a folder for some test notebooks (I used /home/USERNAME/Data)
11)sudo nvidia-docker run -i -t -v /home/USERNAME/Data:/opt/data -p 8888:8888 tensorflow/tensorflow:nightly-
That give you a jupyter notebook on port 8888, it launches the notebook looking at the directory you created in step 10.
This system won’t have Keras installed (so if you prefer Keras main to tf.contrib.keras you’ll need to install it)
And now you can import Keras
It’s worth mentioning here that it’s not best practice to update a docker while it’s running, because each docker container is more like an instance in programming terms than a traditional VM. Every time I run the command “nvidia-docker run -i -t tensorflow/tensorflow:nightly-
I know this was a little long, but I hope it was useful to anyone considering building an Nvidia-Docker homelab for DeepLearning.
You can contact me at my Contact Page if you have any questions, or if you want to talk about anything involving the intersection of Data Science, Machine Learning, and DevOps.
Also published on Medium.