Create CUDA code for stencil functions
B = gpucoder.stencilKernel(FUN,A,[M N],shape,param1,param2...)
applies the function FUN
to each [M,N]
sliding
window of the input A
. Function FUN
is called
for each [M,N]
submatrix of A
and computes an
element of output B
. The index of this element corresponds to the
center of the [M,N]
window.
FUN
is the handle to a user-defined function that returns a
scalar output of the same type as the input.
C= FUN(X,param1,param2, ...)
X
is the [M,N]
submatrix of the original
input A
. X
can be zero-padded when necessary,
for instance at the boundaries of input A
. X
and the window can also be 1-D.
C
is a scalar valued output of FUN
. It is
the output computed for the center element of the [M,N]
array
X
and is assigned to the corresponding element of the output
array B
.
param1,param2
are optional arguments. Pass these arguments if
FUN
requires any additional parameters in addition to the
input window.
The window [M,N]
must be less than or equal to the size of
A
, with the same shape as A
.
If A
is 1-D row vector, the window must be
[1,N]
.
If A
is 1-D column vector, the window must be
[N,1]
.
shape
determines the size of the output array
B
. It can have one of three possible values:
'same'
- Returns output B
that
is the same size as A
.
'full'
- (default) Returns the full output. Size of
B
> size of A
, that is, if
A
is of size (x,y). Size of B = [x +
floor(M/2), y + floor(N/2)]
'valid'
- Returns only those parts of the output
that are computed without the zero-padded edges of A
.
Size of B = [x - floor(M/2), y - floor(N/2)]
The input A
must be a vector or matrix with a numeric type
supported by FUN
. The class of B
is the same
as the class of A
.
Code generation is supported only for fixed size outputs. Shape and window must be compile-time constants because they determine the size of the output.
For very large input sizes, the gpucoder.stencilKernel
function may produce CUDA code that does not numerically match the MATLAB® simulation. In such cases, consider reducing the size of the input
to produce accurate results..
codegen
| coder.gpu.constantMemory
| coder.gpu.kernel
| coder.gpu.nokernel
| gpucoder.matrixMatrixKernel
| gpucoder.reduce
| gpucoder.sort