This example shows you how to perform fine grain analysis for a MATLAB algorithm and its generated CUDA code through software-in-the-loop (SIL) execution profiling. The Embedded Coder® product must be installed to generate the execution profiling report.
Note
The profiling workflow depends on the nvprof
tool from NVIDIA®. In CUDA® toolkit v10.1, NVIDIA restricts access to performance counters to only admin users. To enable GPU
performance counters to be used by all users, see the instructions provided in https://developer.nvidia.com/nvidia-development-tools-solutions-ERR_NVGPUCTRPERM-permission-issue-performance-counters.
For this example create a entry-point function that performs N-D fast Fourier transform.
Use the coder.gpu.kernelfun
pragma to map the FFT to the GPU. By default, the
EnableCUFFT
property is enabled, so the code generator uses cuFFT
library to perform the FFT operation.
function [Y] = gpu_fftn(X) coder.gpu.kernelfun(); Y = fftn(X); end
Use the gpucoder.profile
function to generate the execution profiling report.
cfg = coder.gpuConfig('exe'); cfg.GpuConfig.MallocMode = 'discrete'; gpucoder.profile('gpu_fftn',{rand(2,4500,4)},'CodegenConfig',cfg, ... 'CodegenArguments','-d profilingdir','Threshold',0.001)
The code execution profiling report opens. This report provides metrics based on data collected from a SIL execution. Execution times are calculated from data recorded by instrumentation probes added to the SIL test harness or inside the code generated for each component. See View Execution Times (Embedded Coder) for more information.
codegen
| coder.EmbeddedCodeConfig
| gpucoder.profile