Create GPU CUDA kernel object from PTX and CU code
KERN = parallel.gpu.CUDAKernel(PTXFILE,CPROTO)
KERN
= parallel.gpu.CUDAKernel(PTXFILE,CPROTO,FUNC)
KERN = parallel.gpu.CUDAKernel(PTXFILE,CUFILE)
KERN
= parallel.gpu.CUDAKernel(PTXFILE,CUFILE,FUNC)
KERN = parallel.gpu.CUDAKernel(PTXFILE,CPROTO)
and
KERN
= parallel.gpu.CUDAKernel(PTXFILE,CPROTO,FUNC)
create a
CUDAKernel
object that you can use
to call a CUDA kernel on the GPU. PTXFILE
is the name of the file
that contains the PTX code, or the contents of a PTX file as a character vector; and
CPROTO
is the C prototype for the kernel call that
KERN
represents. If specified, FUNC
must
be a character vector that unambiguously defines the appropriate kernel entry name
in the PTX file. If FUNC
is omitted, the PTX file must contain
only a single entry point.
KERN = parallel.gpu.CUDAKernel(PTXFILE,CUFILE)
and
KERN
= parallel.gpu.CUDAKernel(PTXFILE,CUFILE,FUNC)
create a
kernel object that you can use to call a CUDA kernel on the GPU. In addition, they
read the CUDA source file CUFILE
, and look for a kernel
definition starting with '__global__'
to find the function
prototype for the CUDA kernel that is defined in PTXFILE
.
For information on executing your kernel object, see Run a CUDAKernel.
If simpleEx.cu
contains the following:
/* * Add a constant to a vector. */ __global__ void addToVector(float * pi, float c, int vecLen) { int idx = blockIdx.x * blockDim.x + threadIdx.x; if (idx < vecLen) { pi[idx] += c; } }
and simpleEx.ptx
contains the PTX resulting from compiling
simpleEx.cu
into PTX, both of the following statements return
a kernel object that you can use to call the addToVector
CUDA
kernel.
kern = parallel.gpu.CUDAKernel('simpleEx.ptx', ... 'simpleEx.cu'); kern = parallel.gpu.CUDAKernel('simpleEx.ptx', ... 'float *,float,int');
arrayfun
| existsOnGPU
| feval
| gpuArray
| reset