coder.gpuConfig

Configuration parameters for CUDA code generation from MATLAB code by using GPU Coder

Description

The coder.gpuConfig object contains the configuration parameters that codegen uses for generating CUDA^® MEX, a static library, a dynamically linked library, or an executable program with GPU Coder™. Pass the object to the codegen function by using the -config option.

Creation

Syntax

cfg = coder.gpuConfig(build_type)

cfg = coder.gpuConfig(build_type,'ecoder',false)

cfg = coder.gpuConfig(build_type,'ecoder',true)

Description

example

cfg = coder.gpuConfig(build_type) creates a code generation configuration object for the specified build type, which can be CUDA MEX, a static library, a dynamically linked library, or an executable program. If the Embedded Coder^® product is installed, it creates a coder.EmbeddedCodeConfig object for static library, dynamic library, or executable build types.

cfg = coder.gpuConfig(build_type,'ecoder',false) creates a code generation configuration object to generate CUDA 'lib', 'dll', or 'exe' output even if the Embedded Coder product is installed.

cfg = coder.gpuConfig(build_type,'ecoder',true) creates a coder.EmbeddedCodeConfig configuration object even if the Embedded Coder product is not installed. However, code generation using a coder.EmbeddedCodeConfig object requires an Embedded Coder license.

Input Arguments

expand all

`build_type` — Output to build from generated CUDA C/C++ code
`'MEX'` | `'LIB'` | `'DLL'` | `'EXE'`

Output to build from generated CUDA C/C++ code, specified as one of the values in this table.

Value	Description
`'MEX'`	CUDA MEX
`'LIB'`	Static library
`'DLL'`	Dynamically linked library
`'EXE'`	Executable program

Properties

expand all

coder.GpuConfig contains only GPU specific configuration parameters of the code configuration object. To see all the properties of the code configuration object, see coder.CodeConfig and coder.EmbeddedCodeConfig.

`Enabled` — Control GPU code generation
`true` (default) | `false`

Control generation of CUDA (*.cu) files by using one of the values in this table.

Value Description

Value	Description
`true`	This value is the default value. Enables CUDA code generation.
`false`	Disables CUDA code generation.

true

This value is the default value.

Enables CUDA code generation.

false

Disables CUDA code generation.

Example: cfg.GpuConfig.Enabled = true

`MallocMode` — GPU memory allocation
`'discrete'` (default) | `'unified'`

Memory allocation (malloc) mode to be used in the generated CUDA code, specified as one of the values in this table.

Value Description

Value	Description
`'discrete'`	This value is the default value. The generated code uses the `cudaMalloc` API for transferring data between the CPU and the GPU. From the programmers point-of-view, the discrete mode has a traditional memory architecture with separate CPU and GPU global memory address space.
`'unified'`	The generated code uses the `cudaMallocManaged` API that uses a shared (unified) CPU and GPU global memory address space.

'discrete'

This value is the default value.

The generated code uses the cudaMalloc API for transferring data between the CPU and the GPU. From the programmers point-of-view, the discrete mode has a traditional memory architecture with separate CPU and GPU global memory address space.

'unified'

The generated code uses the cudaMallocManaged API that uses a shared (unified) CPU and GPU global memory address space.

For more information, see Discrete and Managed Modes.

Example: cfg.GpuConfig.MallocMode = 'discrete'

`KernelNamePrefix` — Custom kernel name prefixes
' ' (default) | character vector

Specify a custom name prefix for all the kernels in the generated code. For example, using the value 'CUDA_' creates kernels with names CUDA_kernel1, CUDA_kernel2, and so on. If no name is provided, GPU Coder prepends the kernel name with the name of the entry-point function. Kernel names can contain upper-case letters, lowercase letters, digits 0–9, and underscore character _. GPU Coder removes unsupported characters from the kernel names and appends alpha to prefixes that do not begin with an alphabetic letter.

Example: cfg.GpuConfig.KernelNamePrefix = 'myKernel'

`EnableCUBLAS` — Use `cuBLAS` library
`true` (default) | `false`

Replacement of math function calls with NVIDIA^® cuBLAS library calls, specified as one of the values in this table.

Value Description

Value	Description
`true`	This value is the default value. Allows GPU Coder to replace appropriate math function calls with calls to the `cuBLAS` library. For functions that have no replacements in CUDA, GPU Coder uses portable MATLAB^® functions and attempts to map them to the GPU.
`false`	Disable the use of the `cuBLAS` library in the generated code.

true

This value is the default value.

Allows GPU Coder to replace appropriate math function calls with calls to the cuBLAS library. For functions that have no replacements in CUDA, GPU Coder uses portable MATLAB^® functions and attempts to map them to the GPU.

false

Disable the use of the cuBLAS library in the generated code.

For more information, see Kernels from Library Calls.

Example: cfg.GpuConfig.EnableCUBLAS = true

`EnableCUSOLVER` — Use `cuSOLVER` library
`true` (default) | `false`

Replacement of math function calls with NVIDIA cuSOLVER library calls, specified as one of the values in this table.

Value Description

Value	Description
`true`	This value is the default value. Allows GPU Coder to replace appropriate math function calls with calls to the `cuSOLVER` library. For functions that have no replacements in CUDA, GPU Coder uses portable MATLAB functions and attempts to map them to the GPU.
`false`	Disable the use of the `cuSOLVER` library in the generated code.

true

This value is the default value.

Allows GPU Coder to replace appropriate math function calls with calls to the cuSOLVER library. For functions that have no replacements in CUDA, GPU Coder uses portable MATLAB functions and attempts to map them to the GPU.

false

Disable the use of the cuSOLVER library in the generated code.

For more information, see Kernels from Library Calls.

Example: cfg.GpuConfig.EnableCUSOLVER = true

`EnableCUFFT` — Use `cuFFT` library
`true` (default) | `false`

Replacement of fft function calls with NVIDIA cuFFT library calls, specified as one of the values in this table.

Value Description

Value	Description
`true`	This value is the default value. Allows GPU Coder to replace appropriate `fft` calls with calls to the `cuFFT` library.
`false`	Disables use of the `cuFFT` library in the generated code. With this option, GPU Coder uses C `FFTW` libraries where available or generates kernels from portable MATLAB `fft` code.

true

This value is the default value.

Allows GPU Coder to replace appropriate fft calls with calls to the cuFFT library.

false

Disables use of the cuFFT library in the generated code. With this option, GPU Coder uses C FFTW libraries where available or generates kernels from portable MATLAB fft code.

For more information, see Kernels from Library Calls.

Example: cfg.GpuConfig.EnableCUFFT = true

`Benchmarking` — Add benchmarking to the generated code
`false` (default) | `true`

Control addition of benchmarking code to the generated CUDA code by using one of the values in this table.

Value Description

Value	Description
`false`	This value is the default value. The generated CUDA code does not contain benchmarking functionality.
`true`	Generates CUDA code with benchmarking functionality. This option uses CUDA APIs such as `cudaEvent` to accurately time `kernel`, `memcpy`, and other events.

false

This value is the default value.

The generated CUDA code does not contain benchmarking functionality.

true

Generates CUDA code with benchmarking functionality. This option uses CUDA APIs such as cudaEvent to accurately time kernel, memcpy, and other events.

Example: cfg.GpuConfig.Benchmarking = true

`SafeBuild` — Error checking in the generated code
`false` (default) | `true`

Add error-checking functionality to the generated CUDA code by using one of the values in this table.

Value Description

Value	Description
`false`	This value is the default value. The generated CUDA code does not contain error-checking functionality.
`true`	Generates code with error-checking for CUDA API and kernel calls.

false

This value is the default value.

The generated CUDA code does not contain error-checking functionality.

true

Generates code with error-checking for CUDA API and kernel calls.

Example: cfg.GpuConfig.SafeBuild = true

`ComputeCapability` — Minimum compute capability for code generation
`'3.5'` (default) | `'3.2'` | `'3.7'` | `'5.0'` | `'5.2'` | `'5.3'` | `'6.0'` | `'6.1'` | `'6.2'` | `'7.0'` | `'7.1'` | `'7.2'`

Select the minimum compute capability for code generation. The compute capability identifies the features supported by the GPU hardware. It is used by applications at run time to determine which hardware features, instructions are available on the present GPU. If you specify custom compute capability, GPU Coder ignores this setting.

Example: cfg.GpuConfig.ComputeCapability = '6.1'

`CustomComputeCapability` — Control GPU code generation
`''` (default) | character vector

Specify the name of the NVIDIA virtual GPU architecture for which the CUDA input files must be compiled.

For example, to specify a virtual architecture type -arch=compute_50. You can specify a real architecture using -arch=sm_50. For more information, see the Options for Steering GPU Code Generation topic in the CUDA toolkit documentation.

Example: cfg.GpuConfig.CustomComputeCapability = '-arch=compute_50'

`CompilerFlags` — Additional flags to the GPU compiler
`''` (default) | `character vector`

Pass additional flags to the GPU compiler. For example, --fmad=false instructs the nvcc compiler to disable contraction of floating-point multiply and add to a single Floating-Point Multiply-Add (FMAD) instruction.

For similar NVIDIA compiler options, see the topic on NVCC Command Options in the CUDA toolkit documentation.

Example: cfg.GpuConfig.CompilerFlags = '--fmad=false'

`StackLimitPerThread` — Stack limit per GPU thread
`1024` (default) | `integer`

Specify the maximum stack limit per GPU thread as an integer value.

Example: cfg.GpuConfig.StackLimitPerThread = 1024

`MallocThreshold` — Malloc threshold
`200` (default) | `integer`

Specify the size above which the private variables are allocated on the heap instead of the stack, as an integer value.

Example: cfg.GpuConfig.MallocThreshold = 256

`SelectCudaDevice` — CUDA device selection
`-1` (default) | `deviceID`

In a multi GPU environment such as NVIDIA Drive platforms, specify the CUDA device to target.

Example: cfg.GpuConfig.SelectCudaDevice = <DeviceID>

Note

SelectCudaDevice can be used with gpuArray only if gpuDevice and SelectCudaDevice point to the same GPU. If gpuDevice points to a different GPU, a CUDA_ERROR_INVALID_VALUE runtime error is thrown.

Examples

collapse all

Generate CUDA MEX

Generate CUDA MEX function from a MATLAB function that is suitable for GPU code generation. Also, enable a code generation report.

Write a MATLAB function VecAdd, that performs vector addition of inputs A and B.

function [C] = VecAdd(A,B) %#codegen
    C = coder.nullcopy(zeros(size(A)));
    coder.gpu.kernelfun();
    C = A + B;
end

To generate a MEX function, create a code generation configuration object.

cfg = coder.gpuConfig('mex');

Enable the code generation report.

cfg.GpuConfig.EnableCUBLAS = true;
cfg.GenerateReport = true;

Generate a MEX function in the current folder specifying the configuration object using the -config option.

% Generate a MEX function and code generation report
codegen -config cfg -args {zeros(512,512,'double'),zeros(512,512,'double')} VecAdd

Limitations

GPU Coder always sets the PassStructByReference property of the code configuration object to true.

Documentation

coder.gpuConfig

Description

Creation

Syntax

Description

Input Arguments

`build_type` — Output to build from generated CUDA C/C++ code
`'MEX'` | `'LIB'` | `'DLL'` | `'EXE'`

Properties

`Enabled` — Control GPU code generation
`true` (default) | `false`

`MallocMode` — GPU memory allocation
`'discrete'` (default) | `'unified'`

`KernelNamePrefix` — Custom kernel name prefixes
' ' (default) | character vector

`EnableCUBLAS` — Use `cuBLAS` library
`true` (default) | `false`

`EnableCUSOLVER` — Use `cuSOLVER` library
`true` (default) | `false`

`EnableCUFFT` — Use `cuFFT` library
`true` (default) | `false`

`Benchmarking` — Add benchmarking to the generated code
`false` (default) | `true`

`SafeBuild` — Error checking in the generated code
`false` (default) | `true`

`ComputeCapability` — Minimum compute capability for code generation
`'3.5'` (default) | `'3.2'` | `'3.7'` | `'5.0'` | `'5.2'` | `'5.3'` | `'6.0'` | `'6.1'` | `'6.2'` | `'7.0'` | `'7.1'` | `'7.2'`

`CustomComputeCapability` — Control GPU code generation
`''` (default) | character vector

`CompilerFlags` — Additional flags to the GPU compiler
`''` (default) | `character vector`

`StackLimitPerThread` — Stack limit per GPU thread
`1024` (default) | `integer`

`MallocThreshold` — Malloc threshold
`200` (default) | `integer`

`SelectCudaDevice` — CUDA device selection
`-1` (default) | `deviceID`

Note

Examples

Generate CUDA MEX

Limitations

See Also

Introduced in R2017b

GPU Coder Documentation

Support

Documentation

coder.gpuConfig

Description

Creation

Syntax

Description

Input Arguments

build_type — Output to build from generated CUDA C/C++ code 'MEX' | 'LIB' | 'DLL' | 'EXE'

Properties

Enabled — Control GPU code generation true (default) | false

MallocMode — GPU memory allocation 'discrete' (default) | 'unified'

KernelNamePrefix — Custom kernel name prefixes ' ' (default) | character vector

EnableCUBLAS — Use cuBLAS library true (default) | false

EnableCUSOLVER — Use cuSOLVER library true (default) | false

EnableCUFFT — Use cuFFT library true (default) | false

Benchmarking — Add benchmarking to the generated code false (default) | true

SafeBuild — Error checking in the generated code false (default) | true

ComputeCapability — Minimum compute capability for code generation '3.5' (default) | '3.2' | '3.7' | '5.0' | '5.2' | '5.3' | '6.0' | '6.1' | '6.2' | '7.0' | '7.1' | '7.2'

CustomComputeCapability — Control GPU code generation '' (default) | character vector

CompilerFlags — Additional flags to the GPU compiler '' (default) | character vector

StackLimitPerThread — Stack limit per GPU thread 1024 (default) | integer

MallocThreshold — Malloc threshold 200 (default) | integer

SelectCudaDevice — CUDA device selection -1 (default) | deviceID

Note

Examples

Generate CUDA MEX

Limitations

See Also

Introduced in R2017b

GPU Coder Documentation

Support

`build_type` — Output to build from generated CUDA C/C++ code
`'MEX'` | `'LIB'` | `'DLL'` | `'EXE'`

`Enabled` — Control GPU code generation
`true` (default) | `false`

`MallocMode` — GPU memory allocation
`'discrete'` (default) | `'unified'`

`KernelNamePrefix` — Custom kernel name prefixes
' ' (default) | character vector

`EnableCUBLAS` — Use `cuBLAS` library
`true` (default) | `false`

`EnableCUSOLVER` — Use `cuSOLVER` library
`true` (default) | `false`

`EnableCUFFT` — Use `cuFFT` library
`true` (default) | `false`

`Benchmarking` — Add benchmarking to the generated code
`false` (default) | `true`

`SafeBuild` — Error checking in the generated code
`false` (default) | `true`

`ComputeCapability` — Minimum compute capability for code generation
`'3.5'` (default) | `'3.2'` | `'3.7'` | `'5.0'` | `'5.2'` | `'5.3'` | `'6.0'` | `'6.1'` | `'6.2'` | `'7.0'` | `'7.1'` | `'7.2'`

`CustomComputeCapability` — Control GPU code generation
`''` (default) | character vector

`CompilerFlags` — Additional flags to the GPU compiler
`''` (default) | `character vector`

`StackLimitPerThread` — Stack limit per GPU thread
`1024` (default) | `integer`

`MallocThreshold` — Malloc threshold
`200` (default) | `integer`

`SelectCudaDevice` — CUDA device selection
`-1` (default) | `deviceID`