Analyze Execution Profiles of the Generated Code

This example shows you how to perform fine grain analysis for a MATLAB algorithm and its generated CUDA code through software-in-the-loop (SIL) execution profiling. The Embedded Coder^® product must be installed to generate the execution profiling report.

Note

The profiling workflow depends on the nvprof tool from NVIDIA^®. In CUDA^® toolkit v10.1, NVIDIA restricts access to performance counters to only admin users. To enable GPU performance counters to be used by all users, see the instructions provided in https://developer.nvidia.com/nvidia-development-tools-solutions-ERR_NVGPUCTRPERM-permission-issue-performance-counters.

Create a Design File

For this example create a entry-point function that performs N-D fast Fourier transform. Use the coder.gpu.kernelfun pragma to map the FFT to the GPU. By default, the EnableCUFFT property is enabled, so the code generator uses cuFFT library to perform the FFT operation.

function [Y] = gpu_fftn(X)
  coder.gpu.kernelfun();
  Y = fftn(X);
end

Generate the Execution Profiling Report

Use the gpucoder.profile function to generate the execution profiling report.

cfg = coder.gpuConfig('exe');
cfg.GpuConfig.MallocMode = 'discrete';
gpucoder.profile('gpu_fftn',{rand(2,4500,4)},'CodegenConfig',cfg, ...
'CodegenArguments','-d profilingdir','Threshold',0.001)

The code execution profiling report opens. This report provides metrics based on data collected from a SIL execution. Execution times are calculated from data recorded by instrumentation probes added to the SIL test harness or inside the code generated for each component. See View Execution Times (Embedded Coder) for more information.

Documentation