Sunday, June 30, 2019

Deep Learning in Ubuntu: nvidia drivers and cuda

A Deep Learning environment in Ubuntu 18.04 Bionic Beaver:
Nvidia drivers and CUDA libraries

Hardware environment:
Lenovo ThinkStation P300, equipped with a intel i7 CPU 6core, 32Gb RAM, an Nvidia Quadro P2000 GPU, a nvme 512Gb SSD storage, and a 8Tb SATA HDD.

I am setting up the system to be dual boot: windows 10pro/ubuntu linux.
In order to have the dual boot environment operational, from within the preinstalled windows 10pro system, I was running the Windows 10 media creator, to prepare a USB key for system reinstall.

After that i proceeded reinstalling windows from the USB key, repartitioning the nvme drive, so to have about 300Gb free alongside the windows partitions.

After windows reinstall and consequent updates, I prepared a ubuntu linux installer usbkey, using pendrivelinux tool, burning on the usbkey a current ubuntu 18.04 image.

Be sure to connect the screen to the first Display Port on the Nvidia board (the port closest to the maiboard).

I performed the ubuntu installation on the nvme drive, specifying a swap partition of 64Gb and the rest of the available space devoted to a linux ext4 root partition mounted on "/".

After checking correct boot of both windows and linux, I proceeded in setting up the nvidia drivers on the linux system. This is a bit tricky.
#lsb_release -a
Ubuntu 18.04.2 LTS

#apt-get update

#apt-get install nvidia-driver-390

Since the nvidia driver contains proprietary (non open source) code, when the installation is performed, it is requested to define a password that will have to be entered at the subsequent reboot, at a bios prompt, called MOK.

nvidia-smi shows the nvidia driver status

root@tsp330:~# nvidia-smi
Sat Jun 29 01:46:02 2019       
| NVIDIA-SMI 390.116                Driver Version: 390.116                   |
| GPU  Name     Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0 Quadro P2000        Off  | 00000000:01:00.0  On |     N/A |
| 49%   42C    P8     6W /  75W |    282MiB /  5056MiB |     0%      Default |
| Processes:                                                       GPU Memory |
|  GPU       PID   Type Process name                             Usage      |
|        1207      G   /usr/lib/xorg/Xorg                            39MiB |
|        1279      G   /usr/bin/gnome-shell                          52MiB |
|        1466      G   /usr/lib/xorg/Xorg                           112MiB |
|        1599      G   /usr/bin/gnome-shell                          73MiB |
|        2692      G   /usr/lib/firefox/firefox                       1MiB |

In order to use the GPU processor cores from code, two main things are needed: a GPU driver and a set of libraries (CUDA).
Both of these come from nvidia, but there are significant dependencies issues.

Nvidia driver setup
We need to add some specific software repositories to install current nvidia drivers.


#add-apt-repository ppa:graphics-drivers/ppa
#apt-get update
#apt-get install ssh screen mc
#ip a l

We install also some other tools, namely to access the system from the network, via ssh (take note of the ip address of the system, and check its reachability from another system. It will be needed if something goes wrong with the graphical system). 

Now, opening the graphical tool "Software & updates" (use the search function to locate it: it has an icon with a carboard box with the planet earth), 
You will see several options under the driver tab

I first selected the 410 version, since it is labeled as a "long lived branch" on the nvidia website. Apply changes and the reboot.
At the reboot the workstation performed a bios reflash (quite scary), and then it was not loading the graphical system anymore. (I only got text console, and badly reacting). 

I entered via the network with ssh. Executed apt-get update and apt-get upgrade. Running nvidia-smi I got this error:

“NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”

I then reverted to the old driver.

#apt-get install nvidia-driver-390

After giving the bios password, I was able to reboot in graphical mode.
I then went again in the Software & Update graphical tool, and selected nvidia-driver-430 (the newest one). After a reboot everything was fine. 
nvidia-smi now shows driver 430.26 and cuda version 10.2. Uname -a shows kernel 4.18.0-25

I assume that the nvidia drivers are ok now.


Cuda setup:

 #apt-get install linux-headers-$(uname -r)

Executing  #apt-get install nvidia-cuda-toolkit you can see it is offering cuda 9.1 version. From nvidia site, i see that there is cuda 10.1 version available from the nvidia repositories.


I then execute the following commands from root (over 4 Gb of packages are downloaded):

#sudo apt-key adv --fetch-keys

#sudo bash -c 'echo "deb /" > /etc/apt/sources.list.d/cuda.list'

#sudo bash -c 'echo "deb /" > /etc/apt/sources.list.d/cuda_learn.list'

#apt-get update
#apt-get install cuda-10-1

#apt-get update
#apt-get upgrade
#apt-get install libcudnn7

After this, some lines have to be added to the .bashrc file in the user home, for all the users that have to use the cuda environment.

# set PATH for cuda 10.1 installation
if [ -d "/usr/local/cuda-10.1/bin/" ]; then
    export PATH=/usr/local/cuda-10.1/bin${PATH:+:${PATH}}
    export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Then I performed another reboot, and not the nvidia-smi shows driver version 418.67 and cuda version 10.1. nvcc --version shows cuda version 10.1 too.
Going again in Software & Updates, current drivers appear to be 418. I set again the newest 430 and reboot again.
It now works in graphical mode, with nvidia-smi showing driver version 430.26 and cuda version 10.2. nvcc --version says cuda is 10.1
I now install nvidia profiler and another required library.

#apt-get install nvidia-profiler
#apt-get install libaccinj64-9.1

In order to test compiler and profiler functionality, create a text file with the following code and save it with name:

// Kernel function to add the elements of two arrays
void add(int n, float *x, float *y)
  for (int i = 0; i < n; i++)
    y[i] = x[i] + y[i];

int main(void)
  int N = 1<<20 span="">
  float *x, *y;

  // Allocate Unified Memory – accessible from CPU or GPU
  cudaMallocManaged(&x, N*sizeof(float));
  cudaMallocManaged(&y, N*sizeof(float));

  // initialize x and y arrays on the host
  for (int i = 0; i < N; i++) {
    x[i] = 1.0f;
    y[i] = 2.0f; 
  // Run kernel on 1M elements on the GPU
  add<<<1 1="">>>(N, x, y);

  // Wait for GPU to finish before accessing on host

  // Check for errors (all values should be 3.0f)
  float maxError = 0.0f;
  for (int i = 0; i < N; i++)
    maxError = fmax(maxError, fabs(y[i]-3.0f));
  std::cout << "Max error: " << maxError << std::endl;

  // Free memory

  return 0;

After preparing this file, we can compile

$nvcc -o add_cuda

And we get the add_cuda executable, which works.
We can then profile its execution, from root:

root@TSP339:/home/coder/cuda# nvprof ./add_cuda
==6205== NVPROF is profiling process 6205, command: ./add_cuda
Max error: 0
==6205== Profiling application: ./add_cuda
==6205== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:  100.00%  248.23ms         1  248.23ms  248.23ms  248.23ms  add(int, float*, float*)
      API calls:   67.05%  248.29ms         1  248.29ms  248.29ms  248.29ms  cudaDeviceSynchronize
                   32.74%  121.24ms         2  60.621ms  72.403us  121.17ms  cudaMallocManaged
                    0.11%  410.24us         2  205.12us  199.94us  210.31us  cudaFree
                    0.06%  214.62us        97  2.2120us      89ns  147.25us  cuDeviceGetAttribute
                    0.02%  87.711us         1  87.711us  87.711us  87.711us  cuDeviceTotalMem
                    0.02%  57.659us         1  57.659us  57.659us  57.659us  cudaLaunchKernel
                    0.01%  20.944us         1  20.944us  20.944us  20.944us  cuDeviceGetName
                    0.00%  2.3990us         1  2.3990us  2.3990us  2.3990us  cuDeviceGetPCIBusId
                    0.00%  1.6290us         3     543ns      86ns  1.1260us  cuDeviceGetCount
                    0.00%     541ns         2     270ns     132ns     409ns  cuDeviceGet
                    0.00%     173ns         1     173ns     173ns     173ns  cuDeviceGetUuid

==6205== Unified Memory profiling result:
Device "Quadro P2000 (0)"
   Count  Avg Size  Min Size  Max Size  Total Size  Total Time  Name
      48  170.67KB  4.0000KB  0.9961MB  8.000000MB  735.5200us  Host To Device
      24  170.67KB  4.0000KB  0.9961MB  4.000000MB  360.0960us  Device To Host
      12         -         -         -           -  2.480928ms  Gpu page fault groups
Total CPU Page faults: 36

This completes CUDA setup.