The easiest way to create CUDA® kernels is to place the coder.gpu.kernelfun
pragma into your primary
MATLAB® function. The primary function is also known as the
top-level or entry-point function.
When GPU Coder™ encounters the kernelfun
pragma, it attempts to
parallelize all the computation within this function and then maps it to the GPU.
For more information about GPU kernels, see GPU Programming Paradigm.
In this tutorial, you learn how to:
Prepare your MATLAB code for CUDA code generation by using the kernelfun
pragma.
Create and set up a GPU Coder project.
Define function input properties.
Check for code generation readiness and run-time issues.
Specify code generation properties.
Generate CUDA C code by using the GPU Coder app.
This tutorial requires the following products:
MATLAB
MATLAB Coder™
GPU Coder
C compiler
NVIDIA® GPU enabled for CUDA
CUDA toolkit and driver
Environment variables for the compilers and libraries. For more information, see Environment Variables.
You do not have to be familiar with the algorithm in the example to complete the tutorial.
The Mandelbrot set is the region in the complex plane consisting of the values z0 for which the trajectories defined by this equation remain bounded at k→∞.
The overall geometry of the Mandelbrot set is shown in the figure. This view does not have the resolution to show the richly detailed structure of the fringe just outside the boundary of the set. At increasing magnifications, the Mandelbrot set exhibits an elaborate boundary that reveals progressively finer recursive detail.
For this tutorial, pick a set of limits that specify a highly zoomed part of the Mandelbrot set in the valley between the main cardioid and the p/q bulb to its left. A 1000x1000 grid of real parts (x) and imaginary parts (y) is created between these two limits. The Mandelbrot algorithm is then iterated at each grid location. An iteration number of 500 is enough to render the image in full resolution.
maxIterations = 500; gridSize = 1000; xlim = [-0.748766713922161,-0.748766707771757]; ylim = [0.123640844894862,0.123640851045266];
This tutorial uses an implementation of the Mandelbrot set by using standard MATLAB commands running on the CPU. This implementation is based on the code provided in the Experiments with MATLAB e-book by Cleve Moler. This calculation is vectorized such that every location is updated simultaneously.
Create a MATLAB function called mandelbrot_count.m
with the
following lines of code. This code is a baseline vectorized MATLAB implementation of the Mandelbrot set. For every point
(xGrid,yGrid)
in the grid, it calculates the iteration
index count
at which the trajectory defined by the equation
reaches a distance of 2
from the origin. It then returns the
natural logarithm of count
, which is used generate the color
coded plot of the Mandelbrot set. Later in this tutorial, you modify this file
to make it suitable for code
generation.
function count = mandelbrot_count(maxIterations,xGrid,yGrid) % mandelbrot computation z0 = xGrid + 1i*yGrid; count = ones(size(z0)); z = z0; for n = 0:maxIterations z = z.*z + z0; inside = abs(z)<=2; count = count + inside; end count = log(count);
Create a MATLAB script called mandelbrot_test.m
with the
following lines of code. The script generates a 1000 x 1000 grid of real parts
(x) and imaginary parts (y) between
the limits specified by xlim
and ylim
. It
also calls the mandelbrot_count
function and plots the
resulting Mandelbrot set.
maxIterations = 500; gridSize = 1000; xlim = [-0.748766713922161,-0.748766707771757]; ylim = [0.123640844894862,0.123640851045266]; x = linspace(xlim(1),xlim(2),gridSize); y = linspace(ylim(1),ylim(2),gridSize); [xGrid,yGrid] = meshgrid(x,y); %% Mandelbrot computation in MATLAB count = mandelbrot_count(maxIterations,xGrid,yGrid); % Show figure(1) imagesc(x,y,count); colormap([jet();flipud(jet());0 0 0]); axis off title('Mandelbrot set with MATLAB');
Before making the MATLAB version of the Mandelbrot set algorithm suitable for code generation, you can test the functionality of the original code.
Change the current MATLAB working folder to the location that contains
mandelbrot_count.m
and
mandelbrot_test.m
. GPU Coder places generated code in this folder. Change your
current working folder if you do not have full access to this
folder.
Run the mandelbrot_test
script.
The test script runs and shows the geometry of the Mandelbrot within the
boundary set by the variables xlim
and
ylim
.
Before you generate code with GPU Coder, check for coding issues in the original MATLAB code.
There are two tools that help you detect code generation issues at design time:
Code Analyzer tool
Code generation readiness tool
The Code Analyzer is a tool incorporated into the MATLAB Editor that continuously checks your code as you enter it. The
Code Analyzer reports issues and recommends modifications to maximize
performance and maintainability of your code. To identify the warnings and
errors specific to code generation from your MATLAB code, add the %#codegen
directive to your
MATLAB file. For more information, see Code Analyzer
preferences (MATLAB).
The Code Analyzer does not detect all code generation issues. After eliminating the errors or warnings that the Code Analyzer detects, compile your code with GPU Coder to determine if the code has other compliance issues.
The code generation readiness tool screens the MATLAB code for features and functions that are not supported for code generation. This tool provides a report that lists issues and recommendations for making the MATLAB code suitable for code generation. You can access the code generation readiness tool in these ways:
In the current folder browser — right-click the MATLAB file that contains the entry-point function.
At the command line — by using the
coder.screener()
function.
In the GPU Coder app — after specifying the entry-point files, the app runs the Code Analyzer and the code generation readiness tool.
You can use GPU Coder to check for issues at code generation time. When GPU Coder detects errors or warnings, it generates an error report that describes the issues and provides links to the problematic MATLAB code. For more information, see Code Generation Reports (MATLAB Coder).
To begin the process of making your MATLAB code suitable for code generation, use the file
mandelbrot_count.m
.
Set your MATLAB current folder to the work folder that contains your files for this tutorial.
In the MATLAB Editor, open mandelbrot_count.m
. The
Code Analyzer message indicator at the top right corner of the
MATLAB Editor is green. The analyzer did not detect errors,
warnings, or opportunities for improvement in the code.
After the function declaration, add the %#codegen
directive to turn on the error checking that is specific to code
generation.
function count = mandelbrot_count(maxIterations,xGrid,yGrid) %#codegen
The Code Analyzer message indicator remains green, indicating that it has not detected any code generation issues.
To map the mandelbrot_count
function to a
CUDA kernel, modify the original MATLAB code by placing the coder.gpu.kernelfun
pragma in the body of the function.
function count = mandelbrot_count(maxIterations,xGrid,yGrid) %#codegen % Add kernelfun pragma to trigger kernel creation coder.gpu.kernelfun; % mandelbrot computation z0 = xGrid + 1i*yGrid; count = ones(size(z0)); z = z0; for n = 0:maxIterations z = z.*z + z0; inside = abs(z)<=2; count = count + inside; end count = log(count);
If you use the coder.gpu.kernelfun
pragma,
GPU Coder attempts to map the computations in the function
mandelbrot_count
to the GPU.
Save the file. You are now ready to compile your code by using the GPU Coder app.
On the MATLAB toolstrip Apps tab, under Code
Generation, click the GPU Coder app icon. You can also open the app by typing gpucoder
in the MATLAB Command Window. The app opens the Select
source files page.
On the Select source files page, enter or
select the name of the primary function,
mandelbrot_count
. The primary function is
also known as the top-level or
entry-point function. The app creates a
project with the default name
mandelbrot_count.prj
in the current
folder.
Click Next and go to the Define Input Types step. The app analyzes the function for coding issues and code generation readiness. If the app identifies issues, it opens the Review Code Generation Readiness page where you can review and fix issues. In this example, because the app does not detect issues, it opens the Define Input Types page.
The code generator must determine the data types of all the variables in the MATLAB files at compile time. Therefore, you must specify the data types of all the input variables. You can specify the input data types in one of these two ways:
Provide a test file that calls the project entry-point functions. The GPU Coder app can infer the input argument types by running the test file.
Enter the input types directly.
For more information about input specifications, see Input Specification (MATLAB Coder).
In this example, to define the properties of the inputs
maxIterations
, xGrid
, and
yGrid
, specify the test file
mandelbrot_test.m
:
Enter or select the test file
mandelbrot_test.m
.
Click Autodefine Input Types.
The test file mandelbrot_test.m
calls the
entry-point function, mandelbrot_count.m
with
the expected input types. The app infers that the input
maxIterations
is
double(1x1)
and the inputs
xGrid
and yGrid
are
double(1000x1000)
.
Click Next go to the Check for Run-Time Issues step.
The Check for Run-Time Issues step generates a MEX file from your entry-point functions, runs the MEX function, and reports issues. This step is optional. However, it is a best practice to perform this step. Using this step, you can detect and fix defects that are harder to diagnose in the generated GPU code.
GPU Coder provides the option to perform GPU-specific checks at this point. When you select this option, GPU Coder generates CUDA C code and a MEX file from your entry-point functions, runs the MEX function, and reports issues. Some of the GPU-specific run-time checks include:
Checks for register spills.
Stack size conformance checks.
There may be certain MATLAB constructs in your code that cause the Check for Run-Time Issues to fail CPU-specific checks but pass the GPU-specific checks.
To open the Check for Run-Time Issues dialog box, click the Check for Issues arrow.
In the Check for Run-Time Issues dialog box,
specify a test file or enter code that calls the entry-point
function with example inputs. For this example, use the test file
mandelbrot_test.m
that you used to define the
input types.
To enable GPU-specific checks, select the GPU option button. Click Check for Issues.
The app generates a MEX function. It runs the test script
mandelbrot_test
replacing calls to
mandelbrot_count
with calls to the
generated MEX. If the app detects issues during the MEX function
generation or execution, it provides warning and error messages. You
can click these messages to navigate to the problematic code and fix
the issue. In this example, the app does not detect issues. The MEX
function has the same functionality as the original
mandelbrot_count
function.
There may be certain MATLAB constructs in your code that cause the Check for Run-Time Issues to fail CPU-specific checks but pass the GPU-specific checks.
Click Next go to the Generate Code step.
To open the Generate dialog box, click the Generate arrow.
In the Generate dialog box, you can select the type of build that you want GPU Coder to perform. The available options are listed in this table.
Build Type | Description |
---|---|
Source code | CUDA C Source code to integrate with an external project. |
MEX | Compiled code to run inside MATLAB. |
Static Library | Binary library for static linking with an external project. |
Dynamic
Library | Binary library for dynamic linking with an external project. |
Executable | Standalone program (requires a separate main file written in C). |
For this tutorial, set Build type to
MEX(.mex)
. By generating a MEX
output, you can check the correctness of the generated CUDA code from within MATLAB. The MEX build type does not require additional
settings like Toolchain and Hardware
Board. It also does not provide the option to
generate only the source code. GPU Coder can automatically select an available CUDA toolchain as long as the Environment Variables are set properly.
To view advanced options, select More
Settings. To the Compiler Flags
option, add --fmad=false
. This flag, when passed
to the nvcc
, instructs the compiler to disable
Floating-point Multiply-add (FMAD) optimization. This option is set
to prevent numerical mismatch in the generated code because of
architectural differences between the CPU and the GPU. For more
information, see Numerical Differences Between CPU and GPU.
This table describes the settings specific to GPU Coder.
GPU Coder Configuration Properties
UI Setting | Value Type | Description |
---|---|---|
Kernel Name Prefix |
| Specify custom name prefix for kernel names
in the generated code. For example, entering
Kernel names can contain upper-case
letters, lowercase letters, digits 0–9, and
underscore character _. GPU Coder removes unsupported characters from
the kernel names and appends
|
Malloc Mode |
| Selects the type of GPU memory allocation:
|
Malloc Threshold | Integer | Size above which the private variables are allocated on the heap instead of the stack. |
Stack Limit | Integer | Available stack limit per GPU thread. |
Enable cuSOLVER |
| Allows GPU Coder to utilize cuSOLVER library calls where appropriate. |
Benchmarking |
| Generates CUDA code with benchmarking options such
as |
Safe Build |
| Generates code with error-checking for CUDA API and kernel calls. |
Minimum Compute
Capability |
| Select the minimum compute capability for code generation. The compute capability identifies the features supported by the GPU hardware and is used by applications at run time to determine which hardware features, instructions are available on the present GPU. If you specify custom compute capability, GPU Coder ignores this setting. |
Custom Compute
Capability |
| Specify the name of the NVIDIA virtual GPU architecture for which the CUDA input files must be compiled. For example, to specify a
virtual architecture type
|
Compiler Flags |
| Pass additional flags to the GPU compiler.
For example, For similar NVIDIA compiler options, see the topic on NVCC Command Options in the CUDA toolkit documentation. |
SelectCudaDevice |
| In a multi GPU environment such as NVIDIADrive platforms, specify the CUDA device to target. |
Click Generate.
GPU Coder generates the MEX executable
mandelbrot_count_mex
in your working folder.
The <pwd>\codegen\mex\mandelbrot_count
folder
contains all other the generated files including the CUDA source (*.cu) and header files. The GPU Coder app indicates that the code generation succeeded. It
displays the source MATLAB files and generated output files on the left side of
the page. On the Variables tab, it
displays information about the MATLAB source variables. On the Target Build
Log tab, it displays the build log, including
compiler warnings and errors. By default, in the code window, the
app displays the CUDA source file mandelbrot_count.cu
. To
view a different file, in the Source Code or
Output Files pane, click the file
name.
To view the code generation report, click View Report. The report provides links to your MATLAB code and the generated CUDA (*.cu) files. It also provides compile-time information for the variables and expressions in your MATLAB code. This information helps you to find sources of error and warnings. It also helps you to debug code generation issues in your code. For more information, see Code Generation Reports (MATLAB Coder).
The GPU Kernels section on the Generated Code tab provides a list of kernels created during GPU code generation. The items in this list link to the relevant source code. For example, when you click mandelbrot_count_kernel1, the code section for this kernel is shown in the code browser window.
After you review the report, you can close the Code
Generation Report window. To view the report later,
open report.mldatx
in
<pwd>\codegen\mex\mandelbrot_cout\html
folder.
The <pwd>\codegen\mex\mandelbrot_count
contains the gpu_codegen_info.mat
MAT-file that
contains the statistics for the generated GPU code. This MAT-file
contains the cuda_Kernel
variable that has
information about the thread and block sizes, shared and constant
memory usage, and input and output arguments of each kernel. The
cudaMalloc
and cudaMemcpy
variables contain information about the size of all the GPU
variables and the number of memcpy
calls between
the host and the device.
In the GPU Coder app, click Next to open the Finish Workflow page.
The Finish Workflow page indicates that the code generation succeeded. It provides a project summary and links to the MATLAB source files, the code generation report, and the generated output binaries. You can save the configuration parameters of the current GPU Coder project as a MATLAB script. See Convert MATLAB Coder Project to MATLAB Script (MATLAB Coder).
To verify the correctness of the generated MEX file, see Verify Correctness of the Generated Code.