The GPU Computing SDK includes 100+ code samples, utilities, whitepapers, and additional documentation to help you get started developing, porting, and optimizing your applications for the CUDA architecture. You can get quick access to many of the SDK resources on this page, SDK documentation, or download the complete SDK.
Please note that you may need to install the latest NVIDIA drivers and CUDA Toolkit to compile and run the code samples.
|
Using Inline PTX with OpenCL 
A simple test application that demonstrates a new CUDA 4.0 driver ability to embed PTX in a OpenCL kernel. |
|

or later
Browse Files
|
|
|
Marching Cubes Isosurfaces 
This sample extracts a geometric isosurface from a volume dataset using the marching cubes algorithm. It uses the scan (prefix sum) function from the oclScan SDK sample to perform stream compaction. |
|

or later
Browse Files
|
|
|
OpenCL Tridiagonal 
Efficient matrix solvers for large number of small independent tridiagonal linear systems. OpenCL implementation of 3 different solvers: Parallel Cyclic Reduction, Cyclic Reduction, Sweep (Gauss elimination + reordering optimization for full coalescing). |
|

or later
Browse Files
|
|
|
OpenCL Device Query 
This sample enumerates the properties of the OpenCL devices present in the system. |
|

or later
Browse Files
|
|
|
OpenCL Bandwidth Test 
This is a simple test program to measure the memcopy bandwidth of the GPU. It currently is capable of measuring device to device copy bandwidth, host to device and host to device copy bandwidth for pageable and page-locked memory, memory mapped and direct access. |
|

or later
Browse Files
|
|
|
OpenCL Vector Addition 
Element by element addition of two 1-dimensional arrays. Implemented in OpenCL for CUDA GPU's, with functional comparison against a simple C++ host CPU implementation. |
|

or later
Browse Files
|
|
|
OpenCL Dot Product 
Dot Product (scalar product) of set of input vector pairs. Implemented in OpenCL for CUDA GPU's, with functional comparison against a simple C++ host CPU implementation. |
|

or later
Browse Files
|
|
|
OpenCL Matrix Vector Multiplication 
Simple matrix-vector multiplication example showing increasingly optimized implementations. |
|

or later
Browse Files
|
|
|
OpenCL Overlapped Copy/Compute Sample 
Element by element hypotenuse for two 1-dimensional arrays. Implemented in OpenCL for CUDA GPU's, with functional comparison against a simple C++ host CPU implementation. Demonstrates overlapped copy/compute in 2 command queues |
|

or later
Browse Files
|
|
|
OpenCL Simple Multi-GPU 
This application demonstrates how to make use of multiple GPUs in OpenCL.
|
|

or later
Browse Files
|
|
|
OpenCL Simple OpenGL Interop 
Simple program which demonstrates interoperability between OpenCL and OpenGL. The program modifies vertex positions with OpenCL and uses OpenGL to render the geometry. |
|

or later
Browse Files
|
|
|
OpenCL Scan 
This example demonstrates an efficient OpenCL implementation of parallel prefix sum, also known as "scan". Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array. |
|

or later
Browse Files
|
|
|
OpenCL Parallel Reduction 
A parallel sum reduction that computes the sum of large arrays of values. This sample demonstrates several important optimization strategies for parallel algorithms like reduction. |
|

or later
Browse Files
|
|
|
OpenCL Matrix Transpose 
Efficient matrix transpose. |
|

or later
Browse Files
|
|
|
OpenCL Matrix Multiplication 
This sample implements matrix multiplication and is exactly the same as Chapter 6 of the programming guide.
It has been written for clarity of exposition to illustrate various OpenCL programming principles, not with the goal of providing the most performant generic kernel for matrix multiplication.
CUBLAS provides high-performance matrix multiplication. |
|

or later
Browse Files
|
|
|
OpenCL 3D FDTD 
This sample applies a finite differences time domain progression stencil on a 3D surface. |
|

or later
Browse Files
|
|
|
OpenCL DCT 8x8 
This sample demonstrates how Discrete Cosine Transform (DCT) for 8x8 blocks can be implemented in OpenCL. |
|

or later
Browse Files
|
|
|
OpenCL DirectX Texture Compressor (DXTC) 
High Quality DXT Compression using OpenCL.
This example shows how to implement an existing computationally-intensive CPU compression algorithm in parallel on the GPU, and obtain an order of magnitude performance improvement. |
|

or later
Whitepaper
Browse Files
|
|
|
OpenCL Radix Sort 
This sample demonstrates a very fast and efficient parallel radix sort implemented in OpenCL for CUDA GPUs. |
|

or later
Browse Files
|
|
|
OpenCL Sorting Networks 
This sample implements bitonic sort algorithm for batches of short arrays |
|

or later
Browse Files
|
|
|
OpenCL Black-Scholes Option Pricing 
This sample evaluates fair call and put prices for a given set of European options by Black-Scholes formula. |
|

or later
Whitepaper
Browse Files
|
|
|
OpenCL Hidden Markov Model 
This sample implements a Hidden Markov Model in OpenCL for the GPU. |
|

or later
Browse Files
|
|
|
OpenCL Quasirandom Generator 
This sample implements Niederreiter quasirandom number generator and Moro's Inverse Cumulative Normal Distribution generator. |
|

or later
Browse Files
|
|
|
OpenCL Mersenne Twister 
This sample implements Mersenne Twister random number generator and Cartesian Box-Muller transformation on the GPU. |
|

or later
Browse Files
|
|
|
OpenCL 64-bin and 256-bin Histogram 
This sample demonstrates efficient implementation of 64-bin and 256-bin histograms. |
|

or later
Whitepaper
Browse Files
|
|
|
OpenCL Post-Process OpenGL-Rendered Image 
This sample shows how to post-process an image rendered in OpenGL using OpenCL. |
|

or later
Browse Files
|
|
|
OpenCL Simple Texture 3D 
Simple example that demonstrates use of 3D textures in OpenCL. |
|

or later
Browse Files
|
|
|
OpenCL Box Filter 
Linear 2-dimensional variable-width Box Filter of RGBA image. Implemented in OpenCL for CUDA GPU's, with performance comparison against simple C++ on host CPU. Each of the R, G, B and A channels are treated independently with results computed concurrently for each. |
|

or later
Browse Files
|
|
|
OpenCL Sobel Filter 
2-dimensional 3x3 Sobel Magnitude Filter of RGBA image. Implemented in OpenCL for CUDA GPU's, with performance comparison against simple C++ on host CPU. Gradient magnitude for each of the R, G & B channels is computed concurrently and independently, then combined into a single gradient intensity with linear weighting factors. |
|

or later
Browse Files
|
|
|
OpenCL Median Filter 
Multi-GPU enabled, 2-dimensional 3x3 Median Filter of RGBA image. Implemented in OpenCL for CUDA GPU's, with performance comparison against simple C++ on host CPU. Each of the R, G & B channels are treated independently with results computed concurrently for each. |
|

or later
Browse Files
|
|
|
OpenCL Separable Convolution 
This sample implements convolution filter of a 2D image with arbitrary separable kernel. |
|

or later
Browse Files
|
|
|
OpenCL Recursive Gaussian Filter 
2-dimensional Gaussian Blur Filter of RGBA image using IRF method. Implemented in OpenCL for CUDA GPU's, with performance comparison against simple C++ on host CPU. Each of the R, G, B and A channels are treated independently with results computed concurrently for each. |
|

or later
Browse Files
|
|
|
OpenCL Volume rendering 
This sample demonstrates basic volume rendering using 3D textures. |
|

or later
Browse Files
|
|
|
OpenCL Particle Collision Simulation 
Simulation of elastic collisions of a large # of bodies. Implemented in OpenCL for CUDA GPU's. |
|

or later
Browse Files
|
|
|
OpenCL N-Body Physics Simulation 
Gravitational Simulation of a large # of bodies. Implemented in OpenCL for CUDA GPU's. |
|

or later
Browse Files
|
|