Docker GPU Discovery and Mounting
=================================

This page explains how GPUs are discovered using containerized environments, as well as how these devices are exposed
to the grading pipeline.

GPU Discovery
-------------

Like other host resources, resource discovery is performed in ``DockerDaemonDispatcher``, when each of the daemon hosts
are created. However, since Docker does not have information regarding the installed GPUs on the host, we need to use
other means to obtain the list of GPUs on a system.

``lspci``
^^^^^^^^^

First of all, we need to obtain a list of all GPUs present on the host system. Since ``/sys`` is exposed in Docker
containers, we can use ``lspci`` to obtain the list of PCI devices on the host system. ``lspci`` also has a
machine-parsable format which helps the Grader to determine GPUs in all PCI devices.

We use ``lspci -Dvmm``, which has the following effect (see `lspci(8) <https://man.archlinux.org/man/lspci.8>`_):

- ``-D``: Displays the full PCI domain/bus (will be used to locate the card/render device of each GPU)
- ``-v``: Enables verbose output
- ``-mm``: Outputs the PCI device data in a machine-readable format

After obtaining the output of ``lspci``, we filter all devices which have a class of ``VGA compatible controller``, and
we will have a list of all GPUs on the system.

DRI Card and Rendering Devices
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In addition to the PCI bus number, the card and render device number is required for AMD/Intel GPUs to be used in
containers. This will also be obtained using the discovery phase, and will be retrieved by listing all symlinks under
``/dev/dri/by-path``. Each device will be listed in the form of ``pci-${pciBus}-{card,render}``, and each device can be
mapped to the DRI devices by replacing ``${pciBus}`` with the full PCI bus of the device.

In addition, we also resolve the symlink to the actual device name under ``/dev/dri``.

NVIDIA GPUs
^^^^^^^^^^^

Additionally, we also need to know the device IDs of each NVIDIA GPU as seen from the NVIDIA kernel module, as Docker's
Container Toolkit uses internal GPU IDs to determine which GPU to mount into the system.

From some sources, it appears that GPU IDs are enumerated via the PCI major bus, so the order of devices outputted
by ``lspci`` is sufficient for mapping each GPU device to the index of the device as understood by NVIDIA tools.

GPU Allocation
--------------

GPUs are allocated using the current ``DockerDaemonDispatcher`` logic.

Mounting to a Container
-----------------------

Once a GPU has been allocated to a runner, ``DockerPipelineStage`` will be responsible for mounting the devices to the
containers.

NVIDIA
^^^^^^

NVIDIA GPUs will use Docker's device request functionality to mount the GPU into the container. The device ID of the
NVIDIA GPU will be used here.

AMD/Intel
^^^^^^^^^

Other GPUs will use Docker's device functionality to mount the GPU into the container. The ``/dev/dri`` path of the
GPU will be used here, and will be mounted to the same path in the container. The container will be given read/write
access to the GPU (it is unknown whether ``mknod`` permission is required).