gpucoder.reduce

Optimized GPU implementation for reduction operations

Syntax

S = gpucoder.reduce(A,FUN)

S = gpucoder.reduce(A,{@FUN1,@FUN2,...})

Description

S = gpucoder.reduce(A,FUN) aggregates the values present in the input array A to a single value using the given function handle FUN. The output S is a scalar.

S = gpucoder.reduce(A,{@FUN1,@FUN2,...}) accepts an input array and a cell array of function handles. It aggregates the values present in the input array to a single value for every function handle provided in the cell array. The size of output is 1-by-N, where N is the number of function handles.

The code generator uses shuffle intrinsics to perform efficient reduction on the GPU. Multiple function handles are aggregated inside a single kernel on the GPU.

Examples

collapse all

Sum and Maximum of an Array

This example generates CUDA^® code to find the sum and the maximum of the elements of an array.

In one file, write an entry-point function multireduce that accepts a matrix input A. Use the gpucoder.reduce function to perform two types of reduction operations on the elements of A.

function s = multireduce(A)
  s = gpucoder.reduce(A, {@mysum, @mymax}); 
end

function c = mysum(a, b)
  c = a+b;
end

function c = mymax(a, b)
  c = max(a,b);
end

Use the codegen function to generate CUDA MEX function.

codegen -config coder.gpuConfig('mex') -args {rand(1,1024,'double')} -report multireduce

The following is a snippet of the generated code.

...
cudaMalloc(&gpu_s, 16ULL);
cudaMalloc(&gpu_A, 8192ULL);
cudaMemcpy(gpu_A, (void *)&A[0], 8192ULL, cudaMemcpyHostToDevice);
multireduce_kernel1<<<dim3(1U, 1U, 1U), dim3(32U, 1U, 1U)>>>(*gpu_A, *gpu_s);
coder_reduce0<<<dim3(2U, 1U, 1U), dim3(512U, 1U, 1U)>>>(*gpu_A, *gpu_s);
cudaMemcpy(&s[0], gpu_s, 16ULL, cudaMemcpyDeviceToHost);
...
static __inline__ __device__ real_T shflDown2(real_T in1, uint32_T offset,
  uint32_T mask)
{
  int2 tmp;
  tmp = *(int2 *)&in1;
  tmp.x = __shfl_down_sync(mask, tmp.x, offset);
  tmp.y = __shfl_down_sync(mask, tmp.y, offset);
  return *(real_T *)&tmp;
}
...

Input Arguments

collapse all

`A` — Input array
numeric | logical

The input array to perform the reduction operation on. For code generation, the input array must be of numeric or logical data type.

`FUN` — Function handle
function handle

Handle to a user-defined function. FUN can also be a cell array of function handles. The function handle is a binary function and must satisfy the following requirements:

Accept two inputs and returns one output. The type of the inputs and output to the function must match the type of the input array A.
The function must be commutative and associative, otherwise the behavior is undefined.

Output Arguments

collapse all

`S` — Output
numeric | logical

Result of the reduction operation. During reduction, S is initialized to the value of one of elements of the input array A. Then, the reduction operation is performed by applying FUN to every element in A and S.

Limitations

gpucoder.reduce does not support input arrays that are of complex data type.
The user-defined function must accept two inputs and returns one output. The type of the inputs and output to the function must match the type of the input array A.
The user-defined function must be commutative and associative, otherwise the behavior is undefined.
For some inputs that are of the integer data type, the generated code may contain intermediate computations that reach saturation. In such cases, the results from the generated code may not match the simulation results from MATLAB^®.

Documentation

gpucoder.reduce

Syntax

Description

Examples

Sum and Maximum of an Array

Input Arguments

`A` — Input array
numeric | logical

`FUN` — Function handle
function handle

Output Arguments

`S` — Output
numeric | logical

Limitations

See Also

Apps

Functions

Objects

Topics

GPU Coder Documentation

Support

Documentation

gpucoder.reduce

Syntax

Description

Examples

Sum and Maximum of an Array

Input Arguments

A — Input array numeric | logical

FUN — Function handle function handle

Output Arguments

S — Output numeric | logical

Limitations

See Also

Apps

Functions

Objects

Topics

GPU Coder Documentation

Support

`A` — Input array
numeric | logical

`FUN` — Function handle
function handle

`S` — Output
numeric | logical