This example shows how to use the dspunfold function to generate a multi-threaded MEX file from a MATLAB function.
NOTE : The following assumes that the current host computer has at least 2 physical CPU cores. The presented screenshots, speedup, and latency values were collected using a host computer with 8 physical CPU cores.
Required MathWorks™ products:
DSP System Toolbox™
MATLAB® Coder™
dspunfold generates a multi-threaded MEX file from a MATLAB function using DSP unfolding technology. This MATLAB function can contain an algorithm which is stateless (has no states) or stateful (has states).
Consider the MATLAB function spectralAnalysisExample. The function performs the following algorithm:
1) Compute the one-sided spectrum estimate of the input
2) Compute the total harmonic distortion (THD), sigal to noise ratio (SNR), signal to noise and Distortion ratio (SINAD) and the spurious free dynamic range (SFDR) of the spectrum
type spectralAnalysisExample
function [THD,SNR,SINAD,SFDR] = spectralAnalysisExample(x) % Copyright 2015-2016 The MathWorks, Inc. persistent powerSpectrum if isempty(powerSpectrum) powerSpectrum = dsp.SpectrumEstimator('FrequencyRange','onesided',... 'SampleRate',8000,... 'SpectralAverages',1); end % Get one-sided spectrum estimate Pxx = powerSpectrum(x); % Compute measurements [amp, harmSum, totalNoise, maxSpur] = ... getHarmonicDistortion(... getFrequencyVector(powerSpectrum), Pxx, getRBW(powerSpectrum), 6); THD = 10*log10(harmSum/amp(1)); SNR = 10*log10(amp(1)/totalNoise); SINAD = 10*log10(amp(1)/(harmSum + totalNoise)); SFDR = 10*log10(amp(1)/maxSpur);
To accelerate the algorithm, a common approach is to generate a MEX file using the codegen function. Below is an example of how to do so when using an input of 4096 doubles. The generated MEX file, dspunfoldDCTExample_mex, is single-threaded.
codegen spectralAnalysisExample -args {(1:4096)'}
To generate a multi-threaded MEX file, use the dspunfold function. The argument -s indicates that the algorithm in spectralAnalysisExample has no states.
dspunfold spectralAnalysisExample -args {(1:4096)'} -s 0
State length: 0 frames, Repetition: 1, Output latency: 12 frames, Threads: 6 Analyzing: spectralAnalysisExample.m Creating single-threaded MEX file: spectralAnalysisExample_st.mexw64 Creating multi-threaded MEX file: spectralAnalysisExample_mt.mexw64 Creating analyzer file: spectralAnalysisExample_analyzer.p
This will generate the following files:
multi-threaded MEX file, spectralAnalysisExample_mt
single-threaded MEX file, spectralAnalysisExample_st (which is identical to the MEX file obtained using the codegen function)
self-diagnostic analyzer function, spectralAnalysisExample_analyzer
To measure the speedup of the multi-threaded MEX file relative to the single-threaded MEX file, see the example function dspunfoldBenchmarkSpectrumExample:
type dspunfoldBenchmarkSpectrumExample
function dspunfoldBenchmarkSpectrumExample % Function used to measure the speedup of the multi-threaded MEX file % obtained using dspunfold vs the single-threaded MEX file % Copyright 2015 The MathWorks, Inc. clear spectralAnalysisExample_st; % for benchmark precision purpose clear spectralAnalysisExample_mt; % for benchmark precision purpose numFrames = 1e5; inputFrame = (1:4096)'; % exclude first run from timing measurements spectralAnalysisExample_st(inputFrame); tic; % measure execution time for the single-threaded MEX for frame = 1:numFrames spectralAnalysisExample_st(inputFrame); end timeSingleThreaded = toc; % exclude first run from timing measurements spectralAnalysisExample_mt(inputFrame); tic; % measure execution time for the multi-threaded MEX for frame = 1:numFrames spectralAnalysisExample_mt(inputFrame); end timeMultiThreaded = toc; fprintf('Speedup = %.1fx\n',timeSingleThreaded/timeMultiThreaded);
dspunfoldBenchmarkSpectrumExample measures the execution time taken by spectralAnalysisExample_st and spectralAnalysisExample_mt to process 'numFrames' frames. Finally it prints the speedup, which is the ratio between the multi-threaded MEX file execution time and single-threaded MEX file execution time.
dspunfoldBenchmarkSpectrumExample;
Speedup = 3.5x
The speedup could be improved even more by increasing the Repetition value, which will be discussed later.
DSP unfolding generates a multi-threaded MEX file which buffers multiple signal frames and then processes these frames simultaneously, using multiple cores. This process introduces some deterministic output latency. Executing 'help spectralAnalysisExample_mt' displays more information about the multi-threaded MEX file, one of them being the value of the output latency. For this example, the output of the multi-threaded MEX file has a latency of 16 frames relative to its input, which is not the case for the single-threaded MEX file. Below is the plot generated by dspunfoldShowLatencySpectrumExample, which displays the outputs of the single-threaded and multi-threaded MEX files. Notice that the output of the multi-threaded MEX is delayed by 16 frames, relative to that of the single-threaded MEX.
dspunfoldShowLatencySpectrumExample;
When creating a multi-threaded MEX file using dspunfold, the single-threaded MEX file is also created along with an analyzer function. For this example, the name of the analyzer is spectralAnalysisExample_analyzer.
The goal of the analyzer is to provide a quick way to measure the speedup of the multi-threaded MEX relative to the single-threaded MEX, and also check if the outputs of the multi-threaded MEX and single-threaded MEX match. Outputs usually do not match when an incorrect state length value is specified.
The example below executes the analyzer for the multi-threaded MEX file, dspunfoldFIRExample_mt.
Fs = 8000; NumFrames = 10; t = (1/Fs) * (0:4096*NumFrames-1); t = t.'; f = 100; x = sin(2*pi*f*t) + .01 * randn(size(t)); spectralAnalysisExample_analyzer(x)
Analyzing multi-threaded MEX file spectralAnalysisExample_mt.mexw64. For best results, please refrain from interacting with the computer and stop other processes until the analyzer is done. Latency = 12 frames Speedup = 3.7x ans = Latency: 12 Speedup: 3.6814 Pass: 1
Each input to the analyzer corresponds to the inputs of the dspunfoldFIRExample_mt MEX file. Notice that the length (first dimension) of each input is greater than the expected length. For example, dspunfoldFIRExample_mt expects a frame of 4096 doubles for its first input, while 4096*10 samples were provided to spectralAnalysisExample_analyzer. The analyzer interprets this input as 10 frames of 4096 samples. The analyzer alternates between these 10 input frames (in a circular fashion) while checking if the outputs of the multi-threaded and single-threaded MEX files match.
NOTE : For the analyzer to correctly check for the numerical match between the multi-threaded MEX and single-threaded MEX, it is recommended that you provide at least 2 frames with different values for each input.
Let us modify the spectral measurement example by setting the spectral average length of the spectrum estimate to 4 instead of 1. The spectrum estimate is now a running average of the current estimate and the three previous estimate. This algorithm has a state length of 3 frames. The MATLAB function spectralAnalysisWithStatesExample contains the modified algorithm:
type spectralAnalysisWithStatesExample
function [THD,SNR,SINAD,SFDR] = spectralAnalysisWithStatesExample(x) % Copyright 2015-2016 The MathWorks, Inc. persistent powerSpectrum if isempty(powerSpectrum) powerSpectrum = dsp.SpectrumEstimator('FrequencyRange','onesided',... 'SampleRate',8000,... 'SpectralAverages',4); end % Get one-sided spectrum etimate Pxx = powerSpectrum(x); % Compute measurements [amp, harmSum, totalNoise, maxSpur] = ... getHarmonicDistortion(... getFrequencyVector(powerSpectrum), Pxx, getRBW(powerSpectrum), 6); THD = 10*log10(harmSum/amp(1)); SNR = 10*log10(amp(1)/totalNoise); SINAD = 10*log10(amp(1)/(harmSum + totalNoise)); SFDR = 10*log10(amp(1)/maxSpur);
To build the multi-threaded MEX file, we have to provide the state length corresponding to the two FIR filters. Specifying -s 3 when invoking dspunfold indicates that the state length does not exceed 3 frames.
The speedup can be increased even more by increasing the repetition (-r) provided when invoking dspunfold. The default repetition value is 1. Increasing this value makes the multi-threaded MEX buffer more frames internally, before it starts processing them, increasing the efficiency of the multi-threading, but at the cost of a higher output latency. Also note that the maximum state length allowed is (threads-1)*Repetition*FrameSize frames. If the specified state length exceeds that value, dspunfold falls back a single-threaded MEX. If latency may be tolerated by the application, increasing the value of repetition allows generating a multi-threaded MEX with a longer state.
The command below generates a multi-threaded MEX function using a repetition value of 5 and a state length of 3 frames:
dspunfold spectralAnalysisWithStatesExample -args {(1:4096)'} -s 3 -r 5
State length: 3 frames, Repetition: 5, Output latency: 60 frames, Threads: 6 Analyzing: spectralAnalysisWithStatesExample.m Creating single-threaded MEX file: spectralAnalysisWithStatesExample_st.mexw64 Creating multi-threaded MEX file: spectralAnalysisWithStatesExample_mt.mexw64 Creating analyzer file: spectralAnalysisWithStatesExample_analyzer.p
The analyzer may be used to validate the numerical results of the multi-threaded MEX and provide speed-up and latency information:
L = 4096; NumFrames = 10; sine = dsp.SineWave('SamplesPerFrame',L * NumFrames,'SampleRate',8000); x = sine() + 0.01 * randn(L * NumFrames, 1); spectralAnalysisWithStatesExample_analyzer(x)
Analyzing multi-threaded MEX file spectralAnalysisWithStatesExample_mt.mexw64. For best results, please refrain from interacting with the computer and stop other processes until the analyzer is done. Latency = 60 frames Speedup = 3.4x ans = Latency: 60 Speedup: 3.3575 Pass: 1
The function dspunfoldNoisySineExample demonstrates the usage of the multi-threaded MEX to estimate spectral characteristics of a noisy sine wave. The measurements are plotted on a time scope. Performance of the multi-threaded MEX is compared to the MATLAB simulation and the single-threaded MEX performance. The gains of the multi-threaded MEX are still apparent even with the overhead brought by the plotting and input signal generation of the testbench.
dspunfoldNoisySineExample
MATLAB Sim/Single-threaded MEX speedup: 2.3 MATLAB Sim/Multi-threaded MEX speedup: 4.8
DSP unfolding on Wikipedia : Unfolding (DSP implementation)