To improve code execution speed, use Single Instruction Multiple Data (SIMD), which enables processors to execute a single instruction on multiple data points. This parallel computation is enabled by using computational instructions and data management instructions. The computational instructions enable operations such as arithmetic operations on data that is stored in vector registers. The data management instructions enable movement and organization of data from the registers.
SIMD is available on instruction sets such as Intel SSE
, Intel
AVX
, and Inlined ARM NEON Intrinsics
. To generate code that
contains SIMD instructions, select the appropriate code replacement library. The supported
data types are single
, double
, int32
,
and int64
.
SIMD availability for Simulink™ blocks and the code replacement library for the target hardware is as shown in the table.
Arithmetic Operations | Simulink Blocks | Intel SSE | Intel AVX | Intel AVX-512 | Inlined ARM NEON Intrinsics |
---|---|---|---|---|---|
Addition | Add | Supports single , double ,
int32 , and int64 | Supports single , double ,
int32 , and int64 | Supports single and double | Supports single
|
Subtraction | Add | Supports single , double ,
int32 , and int64 | Supports single , double ,
int32 , and int64 | Supports single and double | Supports single
|
Multiplication | Product, Gain | Supports single , double , and
int32 | Supports single , double , and
int32 | Supports single and double | Supports single
|
Division | Divide | Supports single and double | Supports single and double | Supports single and double | Not supported |
Square Root | Sqrt | Supports single and double | Supports single and double | Supports single and double | Not supported |
Rounding | Ceil and Floor | Supports single and double | Supports single and double | Not supported | Not supported |
SIMD optimization is available for the For Each block and MATLAB
Function blocks containing for-loops
. SIMD code generation is
also supported for some DSP System Toolbox blocks such as FIR Interpolation (DSP System Toolbox), FIR Decimation (DSP System Toolbox), LMS Filter (DSP System Toolbox), and
Discrete FIR Filter.
To identify other DSP System Toolbox blocks that support SIMD code generation, see the
Extended Capability section of each block.
In the Configuration Parameters dialog box, select the required Device vendor and Device type. To enable SIMD, on the Interface pane, choose a Code replacement library parameter by clicking Select and adding the required code replacement libraries to the Selected code replacement libraries - prioritized list pane. This table shows the code replacement libraries for the supported Device vendor and Device type.
Device Vendor | Device Type | Code Replacement Library |
---|---|---|
Intel or AMD | x86-64(Windows 64) | Intel SSE (Windows) |
Intel AVX (Windows) | ||
Intel AVX-512 (Windows) | ||
x86-64(Linux 64) | Intel SSE (Linux) | |
Intel AVX (Linux) | ||
Intel AVX-512 (Linux) | ||
ARM Compatible | ARM Cortex-A | Inlined ARM NEON Intrinsics |
Alternatively, you can use the command line to choose the library. To set the code
replacement library for the currently open model myExampleModel
,
set the parameter to 'CodeReplacementLibrary'
and choose a library such
as 'Intel SSE (Windows)'
.
set_param('myExampleModel','CodeReplacementLibrary','Intel SSE (Windows)')
Consider a model that has two Divide blocks with one block having an
input data type of single
and the other block having an input data type
of double
.
Generate code without adding a code replacement library to the Selected code replacement libraries - prioritized pane. This generated code executes the loop one iteration at a time.
void mDiv_step(void) { int32_T i; for (i = 0; i < 140; i++) { mDiv_Y.Out2[i] = mDiv_U.In1[i] / mDiv_U.In2[i]; mDiv_Y.Out3[i] = mDiv_U.In5[i] / mDiv_U.In6[i]; } }
Generate code containing SIMD instructions by adding the appropriate code replacement
library to the Selected code replacement libraries - prioritized pane.
This generated code is for the Intel SSE(Windows)
code replacement
library.
void mDiv_step(void) { int32_T idx; for (idx = 0; idx <= 136; idx += 4) { _mm_storeu_ps(&mDiv_Y.Out2[idx], _mm_div_ps(_mm_loadu_ps(&mDiv_U.In1[idx]), _mm_loadu_ps(&mDiv_U.In2[idx]))); } for (idx = 0; idx <= 138; idx += 2) { _mm_storeu_pd(&mDiv_Y.Out3[idx], _mm_div_pd(_mm_loadu_pd(&mDiv_U.In5[idx]), _mm_loadu_pd(&mDiv_U.In6[idx]))); } }
_mm_div_ps
and _mm_div_pd
. This process improves the
execution speed of the generated code when deployed on the target hardware. The data
management instructions _mm_storeu_ps
and _mm_loadu_ps
store and load data from the SIMD registers. For the Divide block that has
the data type double
, the loop executes in increments of two. For the
Divide block that has the data type of single
, the loop
executes in increments of four.For a list of a Intel intrinsic functions for supported Simulink blocks, see https://software.intel.com/sites/landingpage/IntrinsicsGuide/. For a list
of Inlined ARM NEON Intrinsics
functions, see https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics.
The generated code is not optimized through SIMD if:
The code in a MATLAB Function block contains scalar data types
outside the body of loops. For instance, if a,b
, and
C
are scalars, the generated code does not optimize an operation
such as c=a+b
.
The code in a MATLAB Function block contains indirectly indexed
arrays or matrices. For instance if A,B,C
, and D
are vectors, the generated code is not vectorized for an operation such as
D(A)=C(A)+B(A)
.
The Simulink model contains a reusable subsystem. The blocks within the reusable subsystem might not be optimized.
The code in a MATLAB Function block contains parallel for-Loops
(parfor
). The parfor
loop is not optimized,
but any loops within the body of the parfor
loop can be
vectorized.
The Partition Dimension parameter of a For Each subsystem is below the Loop unrolling threshold configuration parameter.