You can use GPU arrays as input and output arguments to an entry-point function when
generating CUDA® MEX, source code, static libraries, dynamic libraries, and executables.
Depending on whether a given input to the entry-point function is identified as CPU or GPU
based input and depending on the usage of the variable (used on the GPU or on the CPU)
cudaMemcpy
calls are inserted efficiently in the generated code. By using
the GPU array functionality you can minimize the number of cudaMemcpy
calls
in the generated code.
To use this functionality, do one of the following:
Use coder.typeof
to represent the
gpuArray
type of an entry-point function input. For
example:
coder.typeof(rand(20),'Gpu',true);
Use the gpuArray
function. For
example:
in = gpuArray(rand(1,10)); codegen -config cfg -args {in} test
GPU Coder™ supports all numeric and logical types. char
and
half
data types are not supported. For using variable dimension
arrays, only the bounded types are supported. Scalar GPU arrays, structures,
cell-arrays, classes, enumerated types, and fixed-point data types are not
supported.
The code generator supports all target types for GPU arrays -
'mex'
, 'lib'
, 'dll'
, and
'exe'
. For 'lib'
, 'dll'
,
and 'exe'
targets, you must pass the correct pointers to the
entry-point function in the example main function. For example, if an input is marked
as 'Gpu'
, a GPU pointer should be passed when the entry-point is
called from main function. Software-In-the-Loop (SIL) is supported for
'lib'
and 'dll'
.
The memory allocation (malloc
) mode property of the code
configuration object must be set to to be 'discrete'
. For example,
cfg.GpuConfig.MallocMode = 'discrete';
'unified'
memory mode.During code generation, If one input to entry-point function is of the GPU array,
then the output variables are all GPU array types, provided they are supported for GPU
code generation. For example. if the entry-point function returns a
struct
and because struct
is not supported,
the generated code returns a CPU output. However, if a supported matrix type is
returned, then the generated code returns a GPU output.
coder.gpu.constantMemory
| coder.gpu.kernel
| coder.gpu.kernelfun
| gpucoder.matrixMatrixKernel
| gpucoder.stencilKernel