GPU-based System objects look and behave much like the other System objects in the Communications Toolbox™ product. The important difference is that the algorithm is executed on a Graphics Processing Unit (GPU) rather than on a CPU. Using the GPU can accelerate your simulation.
System objects for the Communications Toolbox product are located in the comm
package and are constructed as:
H = comm.<object name>
For example, a Viterbi Decoder System object™ is constructed as:
H = comm.ViterbiDecoder
In cases where a corresponding GPU-based implementation of a System object exists, they are located in the comm.gpu
package and constructed as:
H = comm.gpu.<object name>
For example, a GPU-based Viterbi Decoder System object is constructed as:
H = comm.gpu.ViterbiDecoder
To see a list of available GPU-based implementations enter help
comm
at the MATLAB® command line and click GPU Implementations.
Graphics Processing Units (GPUs) excel at processing large quantities of data and performing computations with high compute intensity. Processing large quantities of data is one way to maximize the throughput of your GPU in a simulation. The amount of the data that the GPU processes at any one time depends on the size of the data passed to the input of a GPU System object. Therefore, one way to maximize this data size is by processing multiple frames of data.
You can use a single GPU System object to process multiple data frames simultaneously or in parallel. This
differs from the way many of the standard, or non-GPU, System objects are
implemented. For GPU System objects, the number of frames the objects process in a
single call to the object function is either implied by one of the object properties
or explicitly stated using the NumFrames
property on the
objects.
This example shows how to transmit turbo-encoded blocks of data over a BPSK-modulated AWGN channel. Then, it shows how to decode using an iterative turbo decoder and display errors.
Define a noise variable, establish a frame length of 256, and use the random stream property so that the results are repeatable.
noiseVar = 4; frmLen = 256; s = RandStream('mt19937ar', 'Seed', 11); intrlvrIndices = randperm(s, frmLen);
Create a Turbo Encoder System object. The trellis
structure for the constituent convolutional code is poly2trellis(4,
[13 15 17], 13). The InterleaverIndices
property
specifies the mapping the object uses to permute the input bits at
the encoder as a column vector of integers.
turboEnc = comm.TurboEncoder('TrellisStructure', poly2trellis(4, ... [13 15 17], 13), 'InterleaverIndices', intrlvrIndices);
Create a BPSK Modulator System object.
bpsk = comm.BPSKModulator;
Create an AWGN Channel System object.
channel = comm.AWGNChannel('NoiseMethod', 'Variance', 'Variance', ... noiseVar);
Create a GPU-Based Turbo Decoder System object. The
trellis structure for the constituent convolutional code is poly2trellis(4,
[13 15 17], 13). The InterleaverIndicies
property
specifies the mapping the object uses to permute the input bits at
the encoder as a column vector of integers.
turboDec = comm.gpu.TurboDecoder('TrellisStructure', poly2trellis(4, ... [13 15 17], 13), 'InterleaverIndices', intrlvrIndices, ... 'NumIterations', 4);
Create an Error Rate System object.
errorRate = comm.ErrorRate;
Run the simulation.
for frmIdx = 1:8
data = randi(s, [0 1], frmLen, 1);
encodedData = turboEnc(data);
modSignal = bpsk(encodedData);
receivedSignal = channel(modSignal);
Convert the received signal to log-likelihood ratios for decoding.
receivedBits = turboDec(-2/(noiseVar/2))*real(receivedSignal));
Compare original the data to the received data and then calculate the error rate results.
errorStats = errorRate(data,receivedBits); end fprintf('Error rate = %f\nNumber of errors = %d\nTotal bits = %d\n', ... errorStats(1), errorStats(2), errorStats(3))
This example shows how to simultaneously process two data frames
using an LDPC Decoder System object. The ParityCheckMatrix
property
determines the frame size. The number of frames that the object processes is
determined by the frame size and the input data vector length.
numframes = 2; ldpcEnc = comm.LDPCEncoder; ldpcGPUDec = comm.gpu.LDPCDecoder; ldpcDec = comm.LDPCDecoder; msg = randi([0 1], 32400,2); for ii=1:numframes, encout(:,ii) = ldpcEnc(msg(:,ii)); end %single ended to bipolar (for LLRs) encout = 1-2*encout; %Decode on the CPU for ii=1:numframes; cout(:,ii) = ldpcDec(encout(:,ii)); end %Multiframe decode on the GPU gout = ldpcGPUDec(encout(:)); %check equality isequal(gout,cout(:))
This example shows how to process multiple data frames using the
NumFrames
property of the GPU-based Viterbi Decoder
System object. For a Viterbi Decoder, the frame size of your system cannot be
inferred from an object property. Therefore, the NumFrames
property defines the number of frames present in the input data.
numframes = 10; convEncoder = comm.ConvolutionalEncoder('TerminationMethod', 'Terminated'); vitDecoder = comm.ViterbiDecoder('TerminationMethod', 'Terminated'); %Create a GPU Viterbi Decoder, using NumFrames property. vitGPUDecoder = comm.gpu.ViterbiDecoder('TerminationMethod', 'Terminated', ... 'NumFrames', numframes ); msg = randi([0 1], 200, numframes); for ii=1:numframes, convEncOut(:,ii) = 1-2*convEncoder(msg(:,ii)); end %Decode on the CPU for ii=1:numframes; cVitOut(:,ii) = vitDecoder(convEncOut(:,ii)); end %Decode on the GPU gVitOut = vitGPUDecoder(convEncOut(:)); isequal(gVitOut,cVitOut(:))
A GPU-based System object accepts typical MATLAB arrays or objects created using the gpuArray
class. A GPU-based
System object supports input signals with double- or single-precision data types. The output
signal inherits its data type from the input signal.
If the input signal is a MATLAB array, the System object handles data transfer between the CPU and the GPU. The output signal is a MATLAB array.
If the input signal is a gpuArray
, the data remains on the GPU.
The output signal is a gpuArray
. When the object is given a
gpuArray
, calculations take place entirely on the GPU, and no
data transfer occurs. Passing gpuArray
arguments provides
increased performance by reducing simulation time. For more information, see Establish Arrays on a GPU (Parallel Computing Toolbox).
Passing MATLAB arrays to a GPU System object requires transferring the initial data from a CPU to the GPU. Then, the GPU System object performs calculations and transfers the output data back to the CPU. This process introduces latency. When data in the form of a gpuArray is passed to a GPU System object, the object does not incur the latency from data transfer. Therefore, a GPU System object runs faster when you supply a gpuArray as the input.
In general, you should try to minimize the amount of data transfer between the CPU and the GPU in your simulation.
This example shows how to pass a gpuArray to the input of the PSK modulator, reducing latency.
pskGPUModulator = comm.gpu.PSKModulator; x = randi([0 7], 1000, 1, 'single'); gx = gpuArray(x); o = pskGPUModulator(x); class(o) release(pskGPUModulator); %allow input types to change go = pskGPUModulator(gx); class(go)
comm.gpu.AWGNChannel
comm.gpu.BlockDeinterleaver
comm.gpu.BlockInterleaver
comm.gpu.ConvolutionalDeinterleaver
comm.gpu.ConvolutionalEncoder
comm.gpu.ConvolutionalInterleaver
comm.gpu.PSKDemodulator
comm.gpu.PSKModulator
comm.gpu.TurboDecoder
comm.gpu.ViterbiDecoder
The GPU System objects must be simulated using Interpreted
Execution
. You must select this option explicitly on the block
mask; the default value is Code generation
.