HDL Optimized QPSK Receiver with Captured Data

This example shows how to optimize the QPSK receiver modeled in QPSK Transmitter and Receiver example for HDL code generation and hardware implementation. The HDL-optimized model shows a QPSK receiver that addresses real-world communications issues like carrier frequency, phase offset, and timing recovery in a hardware-friendly manner.

Overview

The HDL Optimized QPSK Receiver with Captured Data example provides a hardware-friendly solution that performs baseband processing to handle a time-varying frequency offset and a time-varying symbol delay. Specifically, this example provides an HDL-optimized reference design of a practical digital receiver to mitigate the above-mentioned impairments, and includes coarse frequency compensation, PLL-based fine frequency compensation, timing recovery with fixed-rate resampling, bit stuffing/skipping, frame synchronization, and phase ambiguity resolution.

Compared with the implementation of the receiver in the QPSK Transmitter and Receiver example, three major modifications have been made for efficient HDL code generation:

  • Streaming Input and Output: The HDL optimized QPSK receiver processes data one sample at a time. The captured real-world signal is streamed into the receiver front-end. The streaming output of the HDL optimized receiver is buffered and passed to the text message decoder.

  • Fixed-point: The QPSK receiver logic operates in fixed-point mode.

  • HDL optimized architecture: Several blocks have been redesigned to use hardware efficient algorithms and architectures.

Structure of the Example

The top-level structure of the QPSK receiver model is shown in the following figure. The HDLRx subsystem has been optimized for HDL code generation.

The input data is captured using two USRP® devices and the Communications Toolbox Support Package for USRP® Radio. Specifically, one USRP® device runs the QPSK Transmitter with USRP® Hardware example model and acts as a transmitter, while the other device is the receiver running the companion model QPSK Receiver with USRP® Hardware example. The captured data represents the baseband received signal with a sampling rate of 200 KHz. The data is sample-based and has a length of 200001, which corresponds to a period of 1 s.

The following diagram shows the detailed structure of the HDLRx subsystem.

The subsystems within are further described in the following sections.

1. Automatic Gain Control (AGC) - Adjusts the received signal amplitude to a desired level

2. Root Raised Cosine Receive Filter - Uses a rolloff factor of 0.5, and downsamples the input signal by two

3. Coarse Frequency Compensation - Estimates an approximate frequency offset of the received signal and corrects it

4. Fine Frequency Compensation - Compensates for the residual frequency offset and the phase offset

5. Timing Recovery - Resamples the input signal according to a recovered timing strobe so that symbol decisions are made at the optimum sampling instants

6. Data Decoding - Aligns the frame boundaries, resolves the phase ambiguity caused by the Fine Frequency Compensation subsystem, and demodulates the signal

The structure of the Text Message Decoding subsystem is shown below.

This subsystem is expected to be run in software, therefore, it is preferable to employ frame-based signals to speed up the computation. The HDLRx subsystem outputs three sample-based Boolean signals: bit1, bit2, and dValid. Given that the downstream processing requires a frame signal, the task of converting sample-based signals to frame-based counterparts is accomplished by the dataframer block. The demodulated bit pair, bit1 and bit2, is valid only when dValid is set high. The dataframer block uses the dValid signal to properly fill up a delay line with bit1 and bit2. The Descramble and Print subsystem processes the received data only when its enable signal goes high. This occurs when both the delay line accumulates exactly 200 valid demodulated bits and the RxGo signal is sent high. While the simulation is running, the Descramble and Print subsystem outputs the string "Hello world ###" to the MATLAB® command window, where '###' is a repeating sequence of '000', '001, '002', ..., '099'.

HDL Optimized QPSK Receiver

1. AGC

The phase error detector gain $K_p$ of the phase and timing error detectors is proportional to the received signal amplitude and the average symbol energy. To ensure an optimum loop design, the signal amplitude at the inputs of the carrier recovery and timing recovery loops must be stable. The AGC ensures that the amplitude of the input of the Coarse Frequency Compensation subsystem is 1/Upsampling Factor , so that the equivalent gains of the phase and timing error detectors stay constant over time. The AGC is placed before the Root Raised Cosine Receive Filter so that the signal amplitude can be measured with an oversampling factor of four, thus improving the accuracy of the estimate. Refer to Chapter 7.2.2 and Chapter 8.4.1 of [ 1 ] for details on how to design the phase detector gain $K_p$.

The AGC structure is shown in the following diagram, and pipeline registers are shown in green throughout the model.

2. Root Raised Cosine Receive Filter

The Root Raised Cosine Receive Filter downsamples the input signal by a factor of two, with a rolloff factor of 0.5. It provides matched filtering for the transmitted waveform to boost the signal-to-noise ratio and facilitate the downstream signal processing.

The Root Raised Cosine Receive Filter is implemented using a fully parallel architecture.

3. Coarse Frequency Compensation

The Coarse Frequency Compensation subsystem corrects the input signal with a rough estimate of the frequency offset. The following diagram shows the Coarse Frequency Compensation subsystem.

This subsystem uses a baseband QPSK signal with a designated phase index $n$, frequency offset $\Delta f$ and phase offset $\Delta \phi$ expressed as $e^{(j(n\pi/2+\Delta ft+\Delta \phi))}$, $n=0,1,2,3$. First, the subsystem raises the input signal to the power of four to obtain $e^{(j(4\Delta ft+4\Delta \phi))}$, which is not a function of the QPSK modulation. This is implemented by cascading two product blocks. Then, from the modulation-independent signal, it estimates the tone at four times the frequency offset. After dividing the estimate by four, the so-obtained frequency offset is corrected in the original signal. There is usually a residual frequency offset even after the coarse frequency compensation, which would cause a slow rotation of the constellation. The Fine Frequency Compensation subsystem compensates for this residual frequency.

Comparing the implementation of the Coarse Frequency Compensation subsystem here with those in QPSK Transmitter and Receiver examples and QPSK Receiver with USRP® Hardware example, we can see several modifications:

  • To save resources, the FFT algorithm has been replaced by the frequency estimation algorithm proposed in [ 2 ], which is referred to as the Luise algorithm. Pipeline registers have been used in the data path of the Luise algorithm to break the critial path in the design. See diagram below.

  • The $angle$ function, which constitutes a key component in the Luise algorithm, is computed using the Complex to Magnitude-Angle HDL Optimized block. This block computes the phase using the hardware friendly CORDIC algorithm. To learn more about the Complex to Magnitude-Angle HDL Optimized block, refer to the DSP System Toolbox documentation.

  • The detected phase offset is sent to an NCO to generate a complex exponential signal that is used to correct the phase offset in the original signal. The NCO HDL Optimized block maps the lookup table into a ROM, and provides a lookup table compression option to significantly reduce the lookup table size. To learn more about the NCO HDL Optimized block, refer to the DSP System Toolbox documentation.

4. Fine Frequency Compensation

The Fine Frequency Compensation subsystem, shown in the following figure, implements a phase-locked loop (PLL), described in Chapter 7 of [ 1 ], to track the residual frequency offset and the phase offset in the input signal.

A maximum likelihood Phase Error Detector (PED) , described in Chapter 7.2.2 of [ 1 ], generates the phase error. A tunable proportional-plus-integral Loop Filter , described in Appendix C.2 of [ 1 ], filters the error signal and then feeds it into the NCO block. The NCO block generates a complex exponential signal that is used to correct the residual frequency and phase offsets in the output of the Coarse Frequency Compensation subsystem. Loop Bandwidth (normalized by the sample rate) and Loop Damping Factor are tunable for the Loop Filter. The default normalized loop bandwidth is set to 0.06 and the default damping factor is set to 2.5 (over damping), so that the PLL quickly locks to the intended phase while introducing little phase noise.

5. Timing Recovery

The Timing Recovery subsystem is shown in the following diagram.

The Timing Recovery subsystem implements a PLL, described in Chapter 8 of [ 1 ], to correct the timing error in the received signal. On average, the Timing Recovery subsystem generates one output sample for every two input samples.

The Interpolation Control subsystem implements a decrementing modulo-1 counter, described in Chapter 8.4.3 of [ 1 ], to generate the control signal to facilitate the Data Decoding subsystem to properly select the interpolants of the Interpolation Filter. This control signal also enables the Timing Error Detector (TED), so that it calculates the timing errors at the correct timing instants. The Interpolation Control subsystem updates the timing difference for the Interpolation Filter , generating interpolants at optimum sampling instants.

The Interpolation Filter is a Farrow parabolic filter with $\alpha=0.5$ as described in Chapter 8.4.2 of [ 1 ]. The filter uses an $\alpha$ of 0.5 so that all the filter coefficients become 1, -1/2 and 3/2, which significantly simplifies the interpolator structure.

Based on the interpolants, timing errors are generated by a zero-crossing Timing Error Detector as described in Chapter 8.4.1 of [ 1 ], filtered by a tunable proportional-plus-integral Loop Filter as described in Appendix C.2 of [ 1 ], and fed into the Interpolation Control for a timing difference update. Loop Bandwidth (normalized by the sample rate) and Loop Damping Factor are tunable for the Loop Filter. The default normalized loop bandwidth is set to 0.01 and the default damping factor is set to unity (critical damping) so that the PLL quickly locks to the correct timing while introducing little phase noise.

When the timing error (delay) reaches symbol boundaries, there is one extra or missing interpolant in the output. The TED implements bit stuffing or skipping to handle the extra or missing interpolants. You can refer to Chapter 8.4.4 of [ 1 ] for details of bit stuffing/skipping.

The timing recovery loop normally generates one output symbol for every two input samples. It also outputs a timing strobe (dValid signal) that runs at the input sample rate. Under normal circumstances, the strobe value is simply a sequence of alternating ones and zeros. However, this occurs only when the relative delay between transmitter and receiver contains some fractional part of one symbol period and the integer part of the delay (in symbols) remains constant. If the integer part of the relative delay changes, the strobe value can have two consecutive zeros or two consecutive ones.

6. Data Decoding

The Data Decoding subsystem performs frame synchronization, phase ambiguity resolution, and QPSK demodulation. Its structure is shown in the diagram below:

  • Frame synchronization: The Matched Filter subsystem uses a QPSK-modulated Barker code as a reference to correlate against the received symbols. The modulus of the matched filter output is calculated in the Modulus subsystem and then compared with a threshold. Frame synchronization is declared if the modulus output exceeds the threshold. The threshold for frame synchronization is tunable: a large value increases the miss probability whereas a small value increases the probability of false alarm. In this example, the threshold value is set to 16.

  • Phase ambiguity resolution: The carrier phase PLL of the Fine Frequency Compensation subsystem may lock to the unmodulated carrier with a phase shift of 0, 90, 180, or 270 degrees, which can cause a phase ambiguity. For details of phase ambiguity and its resolution, refer to Chapter 7.2.2 and 7.7 in [ 1 ]. The angle of the matched filter output determines the extra phase shift. The Matched Filter output is fed into the conjugate block to negate the extra phase shift. Once frame synchronization is achieved, the conjugated version of the matched filter output is frozen and multiplied with all the symbols in a frame to effectively resolve the phase ambiguity issue.

  • QPSK demodulation: Each corrected symbol is demodulated and mapped to a pair of bits based on the symbol mapping of QPSK constellation.

Results and Displays

When running the simulation, the model displays three scatter plots to show the constellation of the Fine Frequency Compensation output, the Timing Recovery output, and the frozen conjugated matched filter output, respectively.

The following diagram shows the constellation plot of the Fine Frequency Compensation output. The cluster is scattered around, mainly due to two reasons:

  • The timing error between the clocks at the transmitter and receiver

  • The signals are oversampled by a factor of two. Therefore, half of the symbols are in the transition state between QPSK symbols.

The following diagram shows the constellation of the Timing Recovery output. One observes four concentrated clusters around the true 4-point constellation for QPSK modulation. This verifies the effectiveness of the Timing Recovery subsystem. However, as mentioned before, the Fine Frequency Compensation subsystem may lock the signal with a phase shift of 0, 90, 180, or 270 degree. Therefore, we need to address the phase ambiguity issue before demodulating the signal.

The following figure shows the constellation plot of the frozen conjugated matched filter output. During the entire course of simulation, the signals group on the positive side of the horizontal axis. This grouping indicates that there are no phase ambiguity issues in this example. If compensation was required for an undesired phase shift in order to resolve the phase ambiguity issue, the constellation plots after Timing Recovery would be rotated.

HDL Code Generation

Pipeline registers (shown in green) have been added throughout the model to make sure the HDLRx subsystem does not have a long critical path. The HDL code generated from the HDLRx subsystem was synthesized using Xilinx® ISE on a Virtex6 (xc6vlx75t) FPGA, and the circuit ran at about 145 MHz.

You can use the commands makehdl and makehdltb to generate HDL code and testbench for subsystems in HDLRx. To generate the HDL code, use the following command:

   makehdl(subsysname)

To generate testbench, use the following command:

   makehdltb(subsysname)

References

1. Michael Rice, "Digital Communications - A Discrete-Time Approach", Prentice Hall, April 2008.

2. M. Luise and R. Reggiannini, "Carrier frequency recovery in all-digital modems for burst-mode transmissions," IEEE Trans. Communications, pp. 1169-1178, 1995.

Copyright Notice

USRP® is a trademark of National Instruments Corp.