HDL Optimized QAM Transmitter and Receiver

This example shows how to implement a 64-QAM transmitter and receiver for HDL code generation and hardware implementation.

Overview

The HDL Optimized QAM Transmitter and Receiver example shows how to use Simulink® blocks that support HDL code generation to implement the baseband processing of a digital communications transmitter and receiver.

The HDL QAM Tx subsystem generates a complex valued, 64-QAM modulated constellation. A floating point channel model, Channel, is used to add attenuation, channel noise, carrier frequency offset and fractional delay in order to demonstrate the operation of the receiver subsystem. The HDL QAM Rx subsystem implements a practical digital receiver to mitigate the channel impairments using coarse frequency recovery, timing recovery, frame synchronization and magnitude and phase recovery. The received data packets are then decoded and printed to the MATLAB® Command Window by the Text Message Decoding subsystem.

Structure of the Example

The top-level structure of the QAM receiver model is shown in the following figure. The QAM Tx HDL and QAM Rx HDL subsystems have been optimized for HDL code generation.

The detailed structure of the QAM Tx HDL subsystem can be seen in the figure below.

The QAM Tx HDL subsystem contains the following components, which are described in more detail in the HDL Optimized QAM Transmitter section.

  • Data Generation & Packetization - Generates the packets to be transmitted, grouping the bits for mapping to symbols

  • Symbol Mapping - Maps the bits output from the Data Generation & Packetization subsystem to QAM symbols

  • Pulse Shaping - Performs pulse shaping and upsampling of the symbols using an interpolating RRC (Root Raised Cosine) filter prior to transmission

The structure of the Channel can be seen below. As the Channel subsystem is intended to be a rough approximation of a AWGN channel with attenuation and frequency offset it is intended to be run in software. As a result blocks which are not supported for HDL code generation can be used here, such as the Phase/Frequency Offset block. The Phase/Frequency Offset block does not support fixed point data types, hence the conversion to double at the input of the Channel subsystem. The signal is converted back to fixed point before being output from the Channel subsystem. A fractional delay and AWGN are applied to the transmitted signal and the Gain block attenuates the signal.

The detailed structure of the QAM Rx HDL subsystem can be seen in the figure below.

The QAM Rx HDL subsystem contains the following components which are described in more detail in the HDL Optimized QAM Receiver section.

  • Automatic Gain Control (AGC) - Normalizes the received signal power

  • Coarse Frequency Offset Correction - Estimates the approximate frequency offset and corrects. The subsystem also contains the receive RRC filter which downsamples by 2

  • Timing Recovery - Resamples the input signal according to a recovered timing strobe so that symbol decisions are made at the optimum sampling instants

  • Magnitude & Phase Recovery - Performs packet detection, fine grained phase and amplitude correction

  • Demodulate - Demodulates the signal, de-mapping symbols to bits

The structure of the Text Message Decoding subsystem is shown below.

This subsystem is expected to be run in software, therefore, it is preferable to employ frame-based signals to speed up the computation. The Text Message Decoding subsystem has eight sample-based Boolean input signals: dValid, packetStart and signals bit1 to bit6. Conversion from sample-based signals to frame-based counterparts is implemented by the dataframer MATLAB function block. The demodulated bits are valid only when dValid is set high. The dataframer block uses the dValid signal to fill up a delay line with the received bits and the newPacket signal to forward the data stored in the delay line to the output and reset the delay line. The Descramble and Print subsystem processes the received data only when its enable signal goes high. This occurs when either the delay line accumulates 336 valid demodulated bits or the newPacket signal is high. This will cause the dataframer to set the RxGo signal high. While the simulation is running, the Descramble and Print subsystem outputs the string "Hello world! ~64QAM test string~ ###" to the MATLAB command window, where '###' is a repeating sequence of '000', '001, '002', ..., '099'. Every 50 packets, the bit error rate of the data in the last 50 successfully received packets is also displayed in the MATLAB Command Window.

HDL Optimized QAM Transmitter (HDL QAM Tx)

The HDL Optimized Transmitter contains the Data Generation & Packetization, Symbol Mapping, and Pulse Shaping blocks which are described in detail in the following sections.

1 - Data Generation & Packetization

The Controller FSM (Finite State Machine) and Data Source generates the preamble bits, and the data bits, performs scrambling and builds the packets. Each packet consists of an 84-bit Barker code preamble and 252 bits of scrambled data. The Group Bits block converts the input data bit stream into a six bit integer at 1/6th of the input sampling rate, as required by the symbol mapper.

The Data Source subsystem has a pipeline delay of 2 samples. In addition there is a pipeline delay between the data source and the bit pairing subsystem. The valid signal is therefore delayed to match the pipeline delay of the data path. The Group Bits subsystem reduces the sample rate by a factor of 6. Placing a downsample by 6 in the valid control path ensures that the sample rate matches that of the signal path.

  • Controller FSM - The Controller FSM implements a control state machine using a MATLAB™ function block. The FSM has two states - Pack_Preamble and Append_Data. The Pack_Preamble state asserts the load_preamble signal and de-asserts the reset_preamble and the load_data signals. The FSM will remain in this state for 84 clock cycles. Following this the FSM moves into the Append_Data state, asserting the load_data signal and the reset_preamble signal while releasing the load_preamble signal. The FSM will remain in this state for 252 clock cycles. The load_preamble and reset_preamble are boolean and are used to control the Preamble Address Counter which manages the load of the preamble at the start of each packet. The load_data signal is boolean and is used to enable the Data Address Counter which controls the loading of data into the packet.

  • Data Source - The Data Source Subsystem contains two LUTs, storing the preamble and data bits. The preamble lookup LUT is addressed by the Preamble Address Counter, which is controlled by the reset preamble and load preamble signals generated by the Controller FSM. The data lookup LUT is addressed by the Data Address Counter, which is enabled by the load_data signal generated by the Controller FSM. The Preamble Address Coutner has a reset signal, generated by the Controller FSM, as the same preamble is inserted at the start of each packet. The Data Address Counter does not have a reset signal as the data address sequence is much longer and will vary for each packet as different data bits are placed within each packet. In addition to enabling the counter for the data LUT, the load data input is used to control when the HDL Data Scrambler component should be enabled, and to control selection of preamble or data bits via the Preamble Data Mux.

  • HDL Data Scrambler - The HDL Data Scrambler is shown in the following figure. It is built from first principles using XOR gates (for modulo 2 addition) and registers. An enabled subsystem is used here to ensured that the scrambler is only enabled when there is new input data to be processed.

  • Group Bits - The purpose of the Group Bits subsystem is to group six individual bits into a six-bit unsigned integer output - the format expected by the symbol mapping component. A number of delays are used to align 6 bits at the input of the Bit Concat block which concatenates into a six-bit unsigned output. This output is then downsampled to select the correct grouping of bits.

2 - Symbol Mapping

The Symbol Mapping subsystem uses the Rectangular QAM Modulator Baseband block to map the integer input value onto the appropriate 64-QAM complex valued symbol. The block uses a Gray Mapping scheme.

3 - Pulse Shaping

The Pulse Shaping subsystem uses an RRC Interpolation Filter block with an upsampling factor of 4. A matched filter is implemented in the receiver. The filter is pipelined (see HDL Block Properties).

HDL Optimized QAM Receiver (HDL QAM Rx)

The HDL Optimized Receiver contains the AGC, Coarse Frequency Offset Correction, Timing Recovery, Magnitude & Phase Recovery, and Demodulate blocks, which are described in detail in the following sections.

1 - AGC

The AGC ensures that the amplitude of the input of the Coarse Frequency Compensation is normalized to the range 1 to -1.

The AGC structure is shown in the following diagram, with pipeline registers shown in green throughout the model.

2 - Coarse Frequency Offset Correction

The Coarse Frequency Offset Correction subsystem estimates and corrects for the frequency offset using the Luise-Reggiannini algorithm [ 1 ]. The Frequency Offset Estimation subsystem makes an estimate based on the output of the Root Raised Cosine Receive Filter, then frequency offset correction based on this estimate is applied at the input to the Root Raised Cosine Receive Filter. This ensures that the desired portion of the received signal bandwidth is better aligned with the receiver filter frequency response, improving the SNR compared to correcting at the output of the Root Raised Cosine Receive Filter.

As the estimation and correction algorithm is operating in a closed loop, making iterative updates to the previous estimates of the frequency offset, the system will gradually converge towards a result. A Loop Gain is included to implement averaging of the estimates. This architecture is described in [ 1 ]. The Root Raised Cosine Receive Filter implements a downsampling operation so it is necessary to upsample the feedback signal, using the repeat block, to match the rate at the input to the filter.

Note that there is a residual frequency offset at the output of the Coarse Frequency Offset Correction subsystem that varies over time, even if the frequency offset at the input to the subsystem remains the same, as new estimates of the offset are made. Fine grained correction of the residual offset is performed later in the receiver by the Magnitude and Phase Recovery subsystem.

  • Frequency Offset Estimation : The Frequency Offset Estimation subsystem implements the Luise-Regiannini algorithm, described in [ 1 ]. The signal is first raised to the power four to implement a 4th power phase estimator as described in [ 2 ]. This is implemented by 2 cascaded product blocks, with pipelining added to improve hardware performance. The Discrete FIR Filter implements the filter with rectangular weights, made up of all ones, described in [ 1 ]. The FIR Scale scales the FIR output to account for the filter gain. The Complex To Magnitude-Angle HDL Optimized block is used to implement the $angle$ function, as required by the Luise-Reggiannini algorithm. This block computes the phase using the hardware friendly CORDIC algorithm. For more information, see the Complex to Magnitude-Angle HDL Optimized (DSP System Toolbox) block in DSP System Toolbox™. Before the Frequency Offset Estimation subsystem output, the signal is scaled as required by the Luise-Regiannini algorithm and, in addition, is scaled to match the word length of the NCO.

3 - Timing Recovery

The Timing Recovery subsystem is shown in the following diagram.

The Timing Recovery subsystem implements a PLL, described in Chapter 8 of [ 3 ], to correct the timing error in the received signal. On average, the Timing Recovery subsystem generates one output sample for every two input samples.

The Interpolation Control function block implements a decrementing modulo-1 counter, described in Chapter 8.4.3 of [ 3 ], to generate the control signal to facilitate the selection of the interpolants of the Interpolation Filter. This control signal also enables the Timing Error Detector (TED), so that it calculates the timing errors at the correct timing instants. The Interpolation Control subsystem updates the timing difference, mu, for the Interpolation Filter, generating interpolants at optimum sampling instants.

The Interpolation Filter is a Farrow parabolic filter with $\alpha=0.5$ as described in Chapter 8.4.2 of [ 3 ]. The filter uses an $\alpha$ of 0.5 so that all the filter coefficients become 1, -1/2 and 3/2, which significantly simplifies the interpolator structure. Based on the interpolants, timing errors are generated by a zero-crossing Timing Error Detector as described in Chapter 8.4.1 of [ 3 ].

The Interpolation Filter introduces a fractional delay to the signal in order to compensate for the timing error. The fractional delay is controlled by the mu input signal. When the timing error (delay) reaches symbol boundaries, there is one extra or missing interpolant in the output. The Timing Error Detector implements bit stuffing or skipping to handle the extra or missing interpolants.

Refer to Chapter 8.4.4 of [ 3 ] for details of bit stuffing and skipping. The timing recovery loop normally generates one output symbol for every two input samples. It also outputs a timing strobe (validOut signal) that runs at the input sample rate. Under normal circumstances, the strobe value is simply a sequence of alternating ones and zeros. However, this occurs only when the relative delay between transmitter and receiver contains some fractional part of one symbol period and the integer part of the delay (in symbols) remains constant. If the integer part of the relative delay changes, the strobe value can have two consecutive zeros or two consecutive ones.

4 - Magnitude & Phase Recovery

The Magnitude & Phase Recovery subsystem performs packet synchronization, fine grained frequency recovery and fine grained amplitude recovery.

  • Packet Synchronization: The Preamble Matched Filter subsystem uses the time-reversed complex conjugate of the preamble as the filter weights. The modulus of the output of the Preamble Matched Filter subsystem is calculated using the Modulus subsystem. The output of the Modulus subsystem is then compared to a threshold to detect the preamble at the start of a packet. The MATLAB function block generates a signal, isPreamble, which is held high for the duration of the preamble of each packet. The MATLAB function block also generates the dvalid signal which is set high for the duration of the packet when a preamble has been detected.

  • Fine Grained Magnitude and Phase Recovery : The 1-Tap DLMS (Delayed Least Mean Squares) filter subsystem, adapting over the preamble and using the reference signal generated by Desired Signal Source, corrects for both phase and magnitude errors. The isPreamble signal, generated by the MATLAB function block and set high for the 14 preamble symbols once a packet has been detected, is used to enable the desired signal source and to enable the Adapt input of the 1-Tap DLMS. When the isPreamble signal is low, the weight in the 1-Tap DLMS is held and the Desired Signal Source is reset. The Delayed LMS (DLMS) [ 4 ] algorithm is used here to allow for more pipelining to be introduced and, therefore, reduce the critical path in the filter and increase the maximum clock rate achievable after being implemented in hardware.

The internal structure of the Desired Signal Source subsystem is shown below. The data lookup LUT contains the preamble symbols.

The internal structure of the 1-Tap DLMS subsystem is shown below.

5 - Demodulate

The Demodulate subsystem maps each 64-QAM input symbol to bits, outputting 6 bits for each input symbol. To generate HDL for the Rectangular QAM Demodulator Baseband block, the minimum distance between symbols must be set to 2. This is 8 times larger than the distance between the symbols generated in the transmitter. As a result, the symbols input to the Demodulate subsystem must be scaled up appropriately. This is done using the Shift Arithmetic block which shifts the binary point left by 3 bits to achieve the required multiplication by 8.

Results and Displays

During the simulation, the model displays successfully received packets in the MATLAB Command Window. At every 50 packets, the bit error rate of the data in the last 50 successfully received packets is also displayed in the MATLAB Command Window.

After running the simulation, the model displays six different figures illustrating different aspects of the receiver performance. These are shown below, along with an explanation of each plot. The first five plots show the adaption, over the simulation duration, of the Automatic Gain Control, the Frequency Offset Estimation, the Timing Recovery position estimate, the real part of the constellation at the output of the Timing Recovery subsystem, and at the output of the Magnitude & Phase Recover subsystem. The last plot shows the constellation diagram at the output of Magnitude & Phase Recovery subsystem after any adaption has taken place.

  • AGC Gain Plot

The following plot illustrates the Automatic Gain Control subsystem adapting over time to normalize the output. A balance must be struck between how quickly the AGC adapts and how much ripple there is after the gain has reached a relatively constant level. Using a larger AGC loop gain adapts faster but the amplitude after adaption varies more. Using a smaller loop gain slows the adaption of the AGC, smoothing the level after adaption but taking longer to adapt.

  • Frequency Offset Estimate Plot

The following plot illustrates how the coarse frequency offset gradually adapts towards the frequency offset introduced by the system (the blue horizontal line). It shows that while the estimate comes close to the actual frequency offset, there is still a residual error that must be addressed later in the system.

  • Timing Recovery Position Plot

The following plot shows the mu input to the Interpolation Filter. Note that mu converges to a steady state (with some ripple) over time as the channel delay is not varying during the simulation.

  • Real Part of Timing Recovery Output Plot

The following plot illustrates how the real part of the Timing Recovery subsystem output is beginning to converge towards the eight distinct amplitude levels expected for 64QAM. However, as the residual frequency offset remaining after the coarse frequency recovery has not yet been corrected at this point in the receiver, the quality of the signal varies with the distinct amplitude levels more clearly visible at some points than at others. The constellation still has some rotation at this point in the receiver.

  • Real Part of Symbol Estimates Plot

The following plot shows how the real part of output of the Magnitude & Phase Recovery subsystem adapts over time. Unlike the previous plot, this diagram is generated after the fine frequency recovery, therefore the constellation should not be rotating. There are no samples initially as the output from the block is not valid, and then eight clear amplitude levels should be seen - representing the eight real amplitude levels of the 64-QAM constellation.

  • Recovered Constellation Plot

The following plot shows the constellation at the output of the Magnitude & Phase Recovery subsystem after the system has had time to adapt to the channel. Reducing the channel noise should reduce the size of each of the constellation points; increasing the channel noise begins to merge the distinct constellation points together. If the system has not successfully corrected for the frequency offset, then rotation of the constellation is visible here.

References

1. M. Luise and R. Reggiannini, "Carrier frequency recovery in all-digital modems for burst-mode transmissions," IEEE Trans. Communications, pp. 1169-1178, 1995.

2. Moeneclaey, M. and De Jonghe, G. "ML-oriented NDA carrier synchronization for general rotationally symmetric signal constellations", IEEE Trans. Communications, pp.2531-2533, 1994.

3. Michael Rice, "Digital Communications - A Discrete-Time Approach", Prentice Hall, April 2008.

4. G. Long , F. Ling and J. G. Proakis "The LMS algorithm with delayed coefficient adaptation", IEEE Trans. on Acoustics, Speech and Signal Processing, pp.1397-1405, 1989.