Post

Using all GPUs when running Ollama and OpenWebUI in docker

Using all GPUs when running Ollama and OpenWebUI in docker

Case: Run both Ollama and OpenWebUI in a single docker container and use all available GPUs in the host machine

TL;DR:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Installing nvidia-container-toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Installing ollama and open-webui in docker
sudo docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama
sudo docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui

# Pulling some models from Ollama repository
sudo docker exec open-webui ollama pull phi4

I have been using Ollama as a standalone installation in my homelab which ran as a systemd service. I used OpenWebUI as the user interface to the local Ollama installation. However, a recent version of Ollama failed to auto spawn on my LMDE 6 operating system after a restart.

Naturally, I had to look for an alternative with minimal maintenance need. Thankfully, OpenWebUI offers an image which bundles both Ollama and the OpenWebUI and uses the Nvidia-Container-Toolkit to utilize all installed GPUs.

This means that I only have to maintain one container for both Ollama and OpenWebUI installation. Further, to automate this update process, I use watchtower to install the latest updates of the image from the OpenWebUI team.

Docker run can fail

I observed that running only sudo docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama did not work as it required nvidia-container-toolkit to be installed first.

So I ran

1
2
3
4
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Followed by

1
2
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

And then ran the open-webui docker command

1
2
sudo docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama
sudo docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui

Docker started the container and also setup the watchtower auto-update for the open-webui container.

I can now use the docker exec command to use Ollama cli within the container. I downloaded some models by using

1
2
sudo docker exec open-webui ollama pull phi4
sudo docker exec open-webui ollama pull qwq

Since I reused the docker volume I previously used for open-webui, all my historical prompts were still available after this upgrade. Sweet!

Resources

This post is licensed under CC BY 4.0 by the author.