How to benchmark CUDA in Python?

Clone repository with benchmarks

git clone [email protected]:gustawdaniel/cuda-benchmark-python.git
cd cuda-benchmark-python

Create conda environment

conda create -n cuda-benchmark python=3.10
conda activate cuda-benchmark

Install pytorch using link: https://pytorch.org/get-started/locally/

First check your cuda version by nvidia-smi. For example.:

nvidia-smi

In my case it is

CUDA Version: 12.6 

You can check by nvcc --version:

nvcc --version

For example:

Cuda compilation tools, release 12.6, V12.6.77

If you do not have installed these packages check wiki https://wiki.archlinux.org/title/GPGPU#CUDA

yay -S nvidia cuda nvidia-utils

Last one is optional, but beneficial for most users. If you still do not have access to nvcc add following to ~/.bashrc or ~/.zshrc:

export PATH=/opt/cuda/bin:$PATH

and reload terminal.

Unfortunately there is no pytorch for cuda 12.6. So I will use 12.4

conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia

Now you can run benchmarks

python main.py

It can be useful to check your cpu or gpu models. You can do it by:

lscpu | grep 'Model name'

Exemplary output:

Model name:                           12th Gen Intel(R) Core(TM) i7-12700H

or

Model name:                           AMD Ryzen 9 7945HX with Radeon Graphics

For gpu:

nvidia-smi --query-gpu=name --format=csv,noheader

Exemplary output:

NVIDIA GeForce RTX 3060 Laptop GPU

or

NVIDIA GeForce RTX 4090 Laptop GPU

there are my results form benchmark:

CUDA is available. Benchmark matrix 10000x10000 multiplied 10 times.
CPU Model: 12th Gen Intel(R) Core(TM) i7-12700H
GPU Model: NVIDIA GeForce RTX 3060 Laptop GPU
CPU Benchmark Duration: 33.20103993 seconds
GPU Benchmark Duration: 2.38760203 seconds
Speedup: 13.9056x faster on GPU
CUDA is available. Benchmark matrix 10000x10000 multiplied 10 times.
CPU Model: AMD Ryzen 9 7945HX with Radeon Graphics
GPU Model: NVIDIA GeForce RTX 4090 Laptop GPU
CPU Benchmark Duration: 16.88850302 seconds
GPU Benchmark Duration: 1.13820602 seconds
Speedup: 14.8378x faster on GPU

Recommended tools to monitor results:

btop
nvtop