CUDA
CUDA (Compute Unified Device Architecture) is an extension of C++ (and also a lot of other languages like Fortran, Java, and Python) designed for writing programs that leverage GPUs for accelerating functions.
Introduced in 2006 with NVIDIA’s Tesla architecture, CUDA’s success relies on understanding GPU organization. CUDA is composed of software (language, API, and runtime), firmware (drivers and runtime), and hardware (CUDA-enabled GPU).
CUDA provides two main APIs for GPU device management:
- the CUDA Driver
- CUDA Runtime, which are mutually exclusive. On top of these, CUDA includes libraries for executing popular algorithms and functions, enhancing productivity and performance.
CUDA libraries
Examples of CUDA GPU-accelerated libraries to optimize performance and enhance software productivity are:
- Math Libraries: cuBLAS, cuFFT, CUDA Math Library, etc.
- Parallel Algorithm Libraries: Thrust
- Image and Video Libraries: nvJPEG, Video Codec SDK
- Communication Libraries: NVSHMEM, NCCL
- Deep Learning Libraries: cuDNN, TensorRT
- Partner Libraries: OpenCV, Ffmpeg, ArrayFire
Detailed list and more info: NVIDIA GPU-accelerated libraries
Common Library Workflow:
- Create a library-specific handle
- Allocate device memory for inputs/outputs.
- Format inputs to library-specific formats.
- Execute the library function.
- Retrieve outputs and convert them if necessary.
- Release CUDA resources and continue with the application.