NVIDIA cuda enabled docker container issue – UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount()

  cuda, docker, nvidia-docker, pytorch, ubuntu

I am trying to use the base images provided by NVIDIA that let us use their GPUs via Docker containers. Because I am using docker, there is no need for me to have CUDA Toolkit or CuDNN on my system. All I need to have is the right driver – which I have.

I can run the official pytorch docker containers and the containers utilize my GPU. However when I run anything using the base images from NVIDIA then I get the following Warning –

$ docker run --gpus all -it --rm -p 8000:8000 ubuntu-cuda-gpu:latest
/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0

The application executes, it just uses CPU. But I want to be able to use my GPU like I can when I run the same code(it is a simple pytorch example) using official pytorch docker images.

The base image used is –

FROM nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu20.04
# Setup
RUN apt update && 
    apt install -y bash 
                   build-essential 
                   git 
                   curl 
                   ca-certificates 
                   python3 
                   python3-pip && 
    rm -rf /var/lib/apt/lists

# Your stuff
RUN python3 -m pip install --no-cache-dir --upgrade pip && 
    python3 -m pip install --no-cache-dir 
    torch 
    transformers 
...

If I just run the image without any machine learning code and try to execute nvidia-smi then I get the output as –

$ docker run --gpus all -it --rm -p 8000:8000 ubuntu-cuda-gpu:latest nvidia-smi
Sat Jun 12 19:15:21 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.80       Driver Version: 460.80       CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3060    Off  | 00000000:01:00.0 Off |                  N/A |
|  0%   31C    P8     9W / 170W |     14MiB / 12053MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

This leads me to believe that at least something is right. But why is it that I am not able to use my GPU and how to make sure that I can?

I am on Ubuntu 20.04.

Source: Docker Questions

LEAVE A COMMENT