Optimized GPU implementation of strided and batched matrix multiply operation
performs strided matrix-matrix multiplication of a batch of matrices. The input matrices
D
= gpucoder.stridedMatrixMultiply(A
,B
)A
and B
for each instance of the batch are located
at fixed address offsets from their addresses in the previous instance.
gpucoder.stridedMatrixMultiply
performs matrix-matrix multiplication
of the form:
where is a scalar multiplication factor, A
,
B
, and D
are matrices with dimensions
m
-by-k
,
k
-by-n
, and
m
-by-n
respectively. A
and
B
can optionally be transposed or hermitian-conjugated. By default, is set to one and the matrices are not transposed. Use the
Name,Value
pair arguments to specify a different scalar multiplication
factor and to specify transpose operations on the input matrices.
All the batches passed to the gpucoder.stridedMatrixMultiply
function must be uniform. That is, all instances must have the same dimensions
m,n,k
.
___ = gpucoder.stridedMatrixMultiply(___,
performs strided batched matrix multiply operation using the options specified by one or
more Name,Value
)Name,Value
pair arguments.
coder.gpu.constantMemory
| coder.gpu.kernel
| coder.gpu.kernelfun
| gpucoder.batchedMatrixMultiply
| gpucoder.batchedMatrixMultiplyAdd
| gpucoder.sort
| gpucoder.stencilKernel
| gpucoder.stridedMatrixMultiplyAdd