Generates a multi-threaded MEX file from a MATLAB function
dspunfold
generates a multi-threaded MEX
file from the entry-point MATLAB® function specified by file
file
, using the unfolding technology.
Unfolding is a technique to improve throughput through parallelization. The multi-threaded
MEX file leverages the multicore CPU architecture of the host computer and can improve speed
significantly. In addition to the multi-threaded MEX file, the function generates a
single-threaded MEX file, a self-diagnostic analyzer function, and the corresponding help
files.
When you invoke dspunfold
on an entry-point MATLAB function, dspunfold
generates the following files.
File | Value | Description | Examples |
---|---|---|---|
Multi-threaded MEX file | MEX file | Multi-threaded MEX file generated from the entry-point MATLAB function. The MEX file inherits the |
|
Help file for the multi-threaded MEX file | MATLAB file | MATLAB help file for the multi-threaded MEX file. The help file has the same
name as the MEX file, but with an '.m' extension. To invoke the help file, type
This help file displays information on
how to invoke the MEX file, its syntax, |
|
Single-threaded MEX file | MEX file | Single-threaded MEX file generated from the entry-point MATLAB function. The MEX file inherits the |
|
Help file for the single-threaded MEX file | MATLAB file | MATLAB help file for the single-threaded MEX file. The help file has the same
name as the MEX file, but with an '.m' extension. To invoke the help file, type
The help file displays information on how to invoke the MEX file, its syntax, and types (size, class, and complexity) of the inputs to the MEX file. The syntax to invoke the MEX file should be the same as the syntax shown in the help file. |
|
Self-diagnostic analyzer function | P-coded file |
The first dimension of the analyzer inputs must be a
multiple of the first dimension of the corresponding inputs, given to the
The analyzer inherits the |
|
Help file for the self-diagnostic analyzer function | MATLAB file | Help file for the self-diagnostic analyzer function. The help file has the
same name as the MEX file, but with an '.m' extension. To invoke the help file, type
The help file for the self-diagnostic analyzer function displays information on how to invoke the analyzer function, its syntax, and types (size, class, and complexity) of the inputs to the analyzer function. The syntax to invoke the analyzer function should be the same as the syntax shown in the help file. |
|
General Limitations:
On Windows and Linux, you must use a compiler that supports the Open Multiprocessing (OpenMP) application interface. See Supported Compilers.
If the input MATLAB function has runtime errors, the errors are not caught when you run the
multi-threaded MEX file. Before you use the dspunfold
function, call
codegen
on the MATLAB function and make sure that the MEX file is generated successfully.
If the generated code uses a large amount of memory to store the local variables,
around 4
MB
on Windows platform, the generated multi-threaded MEX file can have
unexpected behavior. This limit varies with each platform. As a workaround, reduce the
size of the input signals or restructure the MATLAB function to use less local memory.
dspunfold
does not support:
Analyzer Limitations:
The following limitations apply to the analyzer function generated by the
dspunfold
function. For more information on the analyzer function, see
'Self-Diagnostic Analyzer’ in the 'More About' section of dspunfold
.
If multiple frames of the analyzer input are identical, the analyzer might throw false positive pass results. It is recommended that you provide at least two different frames for each input of the analyzer.
If the algorithm in the entry-point MATLAB function chooses its state length based on the input values, the analyzer
might provide different pass results for different input values. For an
example, see the FIR_Mean
function in Why Does the Analyzer Choose the Wrong State Length?.
If the input to the entry-point MATLAB function does affect the output immediately, the analyzer might throw false
positive pass results. For an example, see the
Input_Output
function in Why Does the Analyzer Choose a Zero State Length?.
If the output results of the multi-threaded MEX file and single-threaded MEX file
match statistically but do not match numerically, the analyzer does not pass. Consider the
FilterNoise
function that follows, which filters a random noise
signal with an FIR filter. The function calls randn
from within
itself to generate random noise. Hence, the output results of the
FilterNoise
function match statistically but not match numerically.
function Output = FilterNoise(x) persistent FIRFilter if isempty(FIRFilter) FIRFilter = dsp.FIRFilter('Numerator',fir1(12,0.4)); end Output = FIRFilter(x+randn(1000,1)); end
FilterNoise
,
the tool detects an infinite state length. Because the tool cannot find a numerical match
for a finite state length, it chooses an infinite state length.dspunfold FilterNoise -args {randn(1000,1)} -s auto
Analyzing input MATLAB function FilterNoise Creating single-threaded MEX file FilterNoise_st.mexw64 Searching for minimal state length (this might take a while) Checking stateless ... Insufficient Checking 1 ... Insufficient Checking Infinite ... Sufficient Checking 2 ... Insufficient Minimal state length is Inf Creating multi-threaded MEX file FilterNoise_mt.mexw64 Warning: The multi-threading was disabled due to performance considerations. This happens when the state length is greater than or equal to (Threads-1)*Repetition frames (3 frames in this case). > In coder.internal.warning (line 8) In unfoldingEngine/BuildParallelSolution (line 25) In unfoldingEngine/generate (line 207) In dspunfold (line 234) Creating analyzer file FilterNoise_analyzer
The algorithm does not need an infinite state. The state length of the FIR filter,
hence the algorithm is 12
.
Call dspunfold
with state length set to 12.
dspunfold FilterNoise -args {randn(1000,1)} -s 12 -f true
Analyzing input MATLAB function FilterNoise Creating single-threaded MEX file FilterNoise_st.mexw64 Creating multi-threaded MEX file FilterNoise_mt.mexw64 Creating analyzer file FilterNoise_analyzer
Run the analyzer function.
FilterNoise_analyzer(randn(1000*4,1))
Analyzing multi-threaded MEX file FilterNoise_mt.mexw64 ... Latency = 8 frames Speedup = 0.5x Warning: The output results of the multi-threaded MEX file FilterNoise_mt.mexw64 do not match the output results of the single-threaded MEX file FilterNoise_st.mexw64. Check that you provided the correct state length value to the dspunfold function when you generated the multi-threaded MEX file FilterNoise_mt.mexw64. For best practices and possible solutions to this problem, see the 'Tips' section in the dspunfold function reference page. > In coder.internal.warning (line 8) In FilterNoise_analyzer ans = Latency: 8 Speedup: 0.4970 Pass: 0
The analyzer looks for a numerical match and fails the verification, even though the generated multi-threaded MEX file is valid.
Speedup Limitations:
If the entry-point MATLAB function contains code with low complexity, MATLAB overhead or multi-threaded MEX overhead overshadow any performance gains. In
such cases, do not use dspunfold
.
If the number of operations in the input MATLAB function is small compared to the size of the input or output data, the
multi-threaded MEX file does not provide any speedup gain. Sometimes, it can result in a
speedup loss, even if the repetition value is increased. In such cases, do not use
dspunfold
.
General
Do not display plots, scopes, or execute other user interface operations from within the multi-threaded MEX file. The generated MEX file can have unexpected behavior.
Do not use coder.extrinsic
inside the input MATLAB function. The generated MEX file can have unexpected behavior.
When the state length is less than or equal to (threads –
1
) × repetition frames:
Do not use a random number inside the MATLAB function. The outputs of the single-threaded MEX file and the multi-threaded MEX file might not match. Also, the outputs of the consecutive executions of the multi-threaded MEX file might not match. The analyzer might not pass the numerical match verification.
It is recommended that you generate the random number outside the entry-point MATLAB function and pass it as an argument to the function.
Do not use global or persistent variables anywhere other than in the entry-point MATLAB function. For example, avoid using persistent variables in subfunctions. The generated MEX file can produce inaccurate results. In general, global variables are not recommended.
Do not access I/O resources from within the multi-threaded MEX file. The generated MEX file can have unexpected behavior. These resources include file writers and readers, UDP sockets, and audio players and recorders.
Do not use functions with interactive inputs (for example, the keyboard) inside the multi-threaded MEX file. The generated MEX file can have unexpected behavior.
Workflow
To generate a valid multi-threaded MEX file with the required speedup and latency, follow the Workflow for Generating a Multithreaded MEX File using dspunfold.
Before using dspunfold
, call codegen
on the
entry-point MATLAB function and make sure that the function generates a MEX file
successfully.
After generating the multi-threaded MEX file using dspunfold
, run
the analyzer function. Make sure that the analyzer function passes. The exception to this
rule is when the algorithm produces results that match statistically, but not numerically.
In this exception, the analyzer function does not pass
, even though the
dspunfold
function generates a valid multi-threaded MEX file. See
'Analyzer Limitations' for an example.
For help on using the MEX file and analyzer, at the MATLAB command prompt, enter help
and <mexfile
name>
help
.<analyzer
name>
State Length
If you choose a state length that is greater than or equal to the value of the exact state length, the analyzer passes. If the analyzer fails, increase the state length, regenerate the MEX file, and verify again.
If the state length is greater than 0
, the inputs marked as frames
(through -f
option) must all have the same dimensions.
When generating the MEX file and running the analyzer, use inputs that invoke the same state length.
Automatic State Length Detection
When you set -s
to auto
:
If the algorithm in the entry-point MATLAB function chooses a code path based on the input values, use inputs that choose the code path with the longest state length.
Provide random inputs to -args
.
Choose inputs that have an immediate effect on the output. See Why Does the Analyzer Choose a Zero State Length?.
Analyzer
Make sure the outputs of the multi-threaded MEX file and the single-threaded MEX file
do not contain NaN
or an Inf
. The analyzer cannot do
numeric checks and returns pass
as false
. The
automatic state length detection tool detects infinite state length and displays a warning
Warning
The output results of the multi-threaded MEX file do not match the output results of the single-threaded MEX file even for Infinite state length. A possible reason is that input MATLAB function generates different output results between consecutive runs even for the same input values.
Provide multiple frames with different values for each input of the analyzer. To improve the analyzer effectiveness, append successive frames along the first dimension.
Provide inputs to the analyzer that lead to efficient code coverage.
Speedup
To improve the speedup of the multi-threaded MEX file, specify the exact state length
in samples. You can specify the state length in samples by setting at least one entry of
frameinputs
to true
. The use of samples reduces
the overhead and increases the speedup.
To increase the speedup at the cost of larger latency, you can:
Increase the repetition factor. Use the -r
option.
Increase the number of threads. Use the -t
option.
For each input that can be divided into samples without altering the algorithm
behavior, set frame status to true
using the -f
option. The input is then considered in samples, which can increase the speedup of the
generated multi-threaded MEX file.
The multi-threaded MEX file buffers multiple-input signal frames into a buffer of
2
× threads × repetition
frames, where threads is the number of threads, and
repetition is the repetition factor. The MEX file processes these frames
simultaneously, using multiple cores. This process introduces some deterministic latency,
where latency = 2
× threads
× repetition. Latency is traded off with the speedup you might gain by
increasing the number of threads or the repetition factor.