Optimized GPU implementation of batched matrix multiply operation
[
performs matrix-matrix multiplication of a batch of matrices D
1,D
2] = gpucoder.batchedMatrixMultiply(A
1,B
1,A
2,B
2)A1,B1
and
A2,B2
. The gpucoder.batchedMatrixMultiply
function
performs matrix-matrix multiplication of the form:
where is a scalar multiplication factor, A
,
B
, and D
are matrices with dimensions
m
-by-k
,
k
-by-n
, and
m
-by-n
respectively. You can optionally transpose or
hermitian-conjugate A
and B
. By default, is set to one and the matrices are not transposed. To specify a different
scalar multiplication factor and perform transpose operations on the input matrices, use the
Name,Value
pair arguments.
All the batches passed to the gpucoder.batchedMatrixMultiply
function must be uniform. That is, all instances must have the same dimensions
m,n,k
.
___ = gpucoder.batchedMatrixMultiply(___,
performs batched matrix multiply operation by using the options specified by one or more
Name,Value
)Name,Value
pair arguments.
codegen
| coder.gpu.kernel
| coder.gpu.kernelfun
| gpucoder.batchedMatrixMultiplyAdd
| gpucoder.stridedMatrixMultiply
| gpucoder.stridedMatrixMultiplyAdd