dsp.HDLFFT

Fast Fourier transform — optimized for HDL code generation

Description

The HDL FFT System object™ provides two architectures to optimize either throughput or area. Use the streaming Radix 2^2 architecture for high-throughput applications. This architecture supports scalar or vector input data. You can achieve giga-sample-per-second (GSPS) throughput using vector input. Use the burst Radix 2 architecture for a minimum resource implementation, especially with large FFT sizes. Your system must be able to tolerate bursty data and higher latency. This architecture supports only scalar input data. The object accepts real or complex data, provides hardware-friendly control signals, and has optional output frame control signals.

To calculate the fast Fourier transform:

Create the dsp.HDLFFT object and set its properties.
Call the object with arguments, as if it were a function.

To learn more about how System objects work, see What Are System Objects?.

Creation

Syntax

FFT_N = dsp.HDLFFT

FFT_N = dsp.HDLFFT(Name,Value)

Description

FFT_N = dsp.HDLFFT returns an HDL FFT System object, FFT_N, that performs a fast Fourier transform.

example

FFT_N = dsp.HDLFFT(Name,Value) sets properties using one or more name-value pairs. Enclose each property name in single quotes.

Example: fft128 = dsp.HDLFFT('FFTLength',128)

Properties

expand all

Unless otherwise indicated, properties are nontunable, which means you cannot change their values after calling the object. Objects lock when you call them, and the release function unlocks them.

If a property is tunable, you can change its value at any time.

For more information on changing property values, see System Design in MATLAB Using System Objects.

`Architecture` — Hardware implementation
`'Streaming Radix 2^2'` (default) | `'Burst Radix 2'`

Hardware implementation, specified as either:

'Streaming Radix 2^2' — Low-latency architecture. Supports giga-sample-per-second (GSPS) throughput when you use vector input.
'Burst Radix 2'— Minimum resource architecture. Vector input is not supported when you select this architecture.

`ComplexMultiplication` — HDL implementation of complex multipliers
`'Use 4 multipliers and 2 adders'` (default) | `'Use 3 multipliers and 5 adders'`

HDL implementation of complex multipliers, specified as either 'Use 4 multipliers and 2 adders' or 'Use 3 multipliers and 5 adders'. Depending on your synthesis tool and target device, one option may be faster or smaller.

`BitReversedOutput` — Order of the output data
`true` (default) | `false`

Order of the output data, specified as either:

true — The output channel elements are bit reversed relative to the input order.
false — The output channel elements are in linear order.

The FFT algorithm calculates output in the reverse order to the input. When you request output in the same order as the input, the algorithm performs an extra reversal operation. For more information on ordering of the output, see Linear and Bit-Reversed Output Order.

`BitReversedInput` — Expected order of the input data
`false` (default) | `true`

Expected order of the input data, specified as either:

true — The input channel elements are in bit-reversed order.
false — The input channel elements are in linear order.

`Normalize` — Output scaling
`false` (default) | `true`

Output scaling, specified as either:

true — The object implements an overall 1/N scale factor by dividing the output of each butterfly multiplication by 2. This adjustment keeps the output of the FFT in the same amplitude range as its input.
false — The object avoids overflow by increasing the word length by one bit after each butterfly multiplication. The bit growth is the same for both architectures.

`FFTLength` — Number of data points used for one FFT calculation
1024 (default) | integer power of 2 between 2³ and 2¹⁶

Number of data points used for one FFT calculation, specified as an integer power of 2 between 2³ and 2¹⁶. The object accepts FFT lengths outside this range, but they are not supported for HDL code generation.

`ResetInputPort` — Enable reset argument
`false` (default) | `true`

Enable reset input argument to the object. When reset is true, the object stops calculation and clears all internal state.

`StartOutputPort` — Enable start output argument
`false` (default) | `true`

Enable startOut output argument of the object. When enabled, the object returns an additional output signal that is true on the first cycle of each valid output frame.

`EndOutputPort` — Enable end output argument
`false` (default) | `true`

Enable endOut output argument of the object. When enabled, the object returns an additional output signal that is true on the first cycle of each valid output frame.

`RoundingMethod` — Rounding mode used for fixed-point operations
`'Floor'` (default) | `'Ceiling'` | `'Convergent'` | `'Nearest'` | `'Round'` | `'Zero'`

Rounding mode used for fixed-point operations. When the input is any integer or fixed-point data type, the FFT algorithm uses fixed-point arithmetic for internal calculations. This option does not apply when the input is single or double type. Rounding applies to twiddle factor multiplication and scaling operations.

Usage

Syntax

[Y,validOut]
= FFT_N(X,validIn)

[Y,validOut,ready]
= FFT_N(X,validIn)

[Y,startOut,endOut,validOut]
= FFT_N(X,validIn)

[Y,validOut]
= FFT_N(X,validIn,resetIn)

[Y,startOut,endOut,validOut]
= FFT_N(X,validIn,resetIn)

Description

example

[Y,validOut] = FFT_N(X,validIn) returns the FFT, Y, of the input, X, when validIn is true. validIn and validOut are logical scalars that indicate the validity of the input and output signals, respectively.

[Y,validOut,ready] = FFT_N(X,validIn) returns the fast Fourier transform (FFT) when using the burst Radix 2 architecture. The ready signal indicates when the object can accept input samples.

To use this syntax, set the Architecture property to 'Burst Radix 2'. For example:

FFT_N = dsp.HDLFFT(___,'Architecture','Burst Radix 2');
...
[y,validOut,ready] = FFT_N(x,validIn)

[Y,startOut,endOut,validOut] = FFT_N(X,validIn) also returns frame control signals startOut and endOut. startOut is true on the first sample of a frame of output data. endOut is true for the last sample of a frame of output data.

To use this syntax, set the StartOutputPort and EndOutputPort properties to true. For example:

FFT_N = dsp.HDLFFT(___,'StartOutputPort',true,'EndOutputPort',true);
...
[y,startOut,endOut,validOut] = FFT_N(x,validIn)

[Y,validOut] = FFT_N(X,validIn,resetIn) returns the FFT when validIn is true and resetIn is false. When resetIn is true, the object stops the current calculation and clears all internal state.

To use this syntax set the ResetInputPort property to true. For example:

FFT_N = dsp.HDLFFT(___,'ResetInputPort',true);
...
[y,validOut] = FFT_N(x,validIn,resetIn)

[Y,startOut,endOut,validOut] = FFT_N(X,validIn,resetIn) returns the FFT, Y, using all optional control signals. You can use any combination of the optional port syntaxes.

Input Arguments

expand all

`X` — Input data
scalar or column vector of real or complex values

Input data, specified as a scalar or column vector of real or complex values, in fixed-point or integer format. Vector input is supported with 'Streaming Radix 2^2' architecture only. The vector size must be a power of 2 between 1 and 64, and not greater than the FFT length.

double and single data types are supported for simulation, but not for HDL code generation.

`validIn` — Validity of input data
logical scalar

Validity of input data, specified as a logical scalar.

Data Types: logical

`resetIn` — Reset internal state
logical scalar

Reset internal state, specified as a logical scalar. To enable this argument, set the ResetInputPort property to true.

Data Types: logical

Output Arguments

expand all

`Y` — Output data
scalar or column vector of real or complex values

Output data, returned as a scalar or column vector of real or complex values. The output format matches the format of the input data.

`ready` — Memory available for input data
logical scalar

Indication that the object has memory available for input data, returned as a logical scalar. This output is returned when you select 'Burst Radix 2' architecture.

Data Types: logical

`startOut` — First sample of output frame
logical scalar

First sample of output frame, returned as a logical scalar. To enable this argument, set the StartOutputPort property to true.

Data Types: logical

`endOut` — Last sample of output frame
logical scalar

Last sample of output frame, returned as a logical scalar. To enable this argument, set the EndOutputPort property to true.

Data Types: logical

`validOut` — Validity of output data
logical scalar

Validity of output data, returned as a logical scalar.

Data Types: logical

Object Functions

To use an object function, specify the System object as the first input argument. For example, to release system resources of a System object named obj, use this syntax:

release(obj)

expand all

Specific to dsp.HDLFFT

getLatency Latency of FFT or channelizer calculation

Common to All System Objects

`step`	Run System object algorithm
`release`	Release resources and allow changes to System object property values and input characteristics
`reset`	Reset internal states of System object

Examples

collapse all

Create FFT for HDL Generation

This example uses:

Open Script

Create the specifications and input signal.

N = 128;
Fs = 40;
t = (0:N-1)'/Fs;
x = sin(2*pi*15*t) + 0.75*cos(2*pi*10*t);
y = x + .25*randn(size(x));
y_fixed = sfi(y,32,24);

Write a function that creates and calls the System object™. You can generate HDL from this function.

Note: This object syntax runs only in R2016b or later. If you are using an earlier release, replace each call of an object with the equivalent step syntax. For example, replace myObject(x) with step(myObject,x).

function [yOut,validOut] = HDLFFT128(yIn,validIn)
%HDLFFT128 
% Processes one sample of FFT data using the dsp.HDLFFT System object(TM)
% yIn is a fixed-point scalar or column vector. 
% validIn is a logical scalar value.
% You can generate HDL code from this function.

  persistent fft128;
  if isempty(fft128)
    fft128 = dsp.HDLFFT('FFTLength',128);
  end    
  [yOut,validOut] = fft128(yIn,validIn);
end

Compute the FFT by calling the function for each data sample.

Yf = zeros(1,3*N);
validOut = false(1,3*N);
for loop = 1:1:3*N
    if (mod(loop, N) == 0)
        i = N;
    else
        i = mod(loop, N);
    end
    [Yf(loop),validOut(loop)] = HDLFFT128(complex(y_fixed(i)),(loop <= N));
end

Discard invalid data samples. Then plot the frequency channel results from the FFT.

Yf = Yf(validOut == 1);
Yr =  bitrevorder(Yf);
plot(Fs/2*linspace(0,1,N/2), 2*abs(Yr(1:N/2)/N))
title('Single-Sided Amplitude Spectrum of Noisy Signal y(t)')
xlabel('Frequency (Hz)')
ylabel('Output of FFT (f)')

Create Vector-Input FFT for HDL Generation

Open Script

Create specifications and input signal. This example uses a 128-point FFT and computes the transform over 16 samples at a time.

N = 128;
V = 16;
Fs = 40;
t = (0:N-1)'/Fs;
x = sin(2*pi*15*t) + 0.75*cos(2*pi*10*t);
y = x + .25*randn(size(x));
y_fixed = sfi(y,32,24);
y_vect = reshape(y_fixed,V,N/V);

Write a function that creates and calls the System object™. The function does not need to know the vector size. The object saves the size of the input signal the first time you call it.

function [yOut,validOut] = HDLFFT128V16(yIn,validIn)
%HDLFFT128V16 
% Processes 16-sample vectors of FFT data 
% yIn is a fixed-point column vector. 
% validIn is a logical scalar value.
% You can generate HDL code from this function.

  persistent fft128v16;
  if isempty(fft128v16)
    fft128v16 = dsp.HDLFFT('FFTLength',128);
  end    
  [yOut,validOut] = fft128v16(yIn,validIn);
end

Compute the FFT by passing 16-element vectors to the object. Use the getLatency function to find out when the first output data sample will be ready. Then, add the frame length to determine how many times to call the object. Because the object variable is inside the function, use a second object to call getLatency. Use the loop counter to flip validIn to false after N input samples.

tempfft = dsp.HDLFFT;
loopCount = getLatency(tempfft,N,V)+N/V;
Yf = zeros(V,loopCount);
validOut = false(V,loopCount);
for loop = 1:1:loopCount
    if ( mod(loop,N/V) == 0 )
        i = N/V;
    else
        i = mod(loop,N/V);
    end
    [Yf(:,loop),validOut(loop)] = HDLFFT128V16(complex(y_vect(:,i)),(loop<=N/V));
end

Discard invalid output samples.

C = Yf(:,validOut==1);
Yf_flat = C(:);

Plot the frequency channel data from the FFT. The FFT output is in bit-reversed order. Reorder it before plotting.

Yr =  bitrevorder(Yf_flat);
plot(Fs/2*linspace(0,1,N/2),2*abs(Yr(1:N/2)/N))
title('Single-Sided Amplitude Spectrum of Noisy Signal y(t)')
xlabel('Frequency (Hz)')
ylabel('Output of FFT (f)')

Explore Latency of HDL FFT Object

Open Live Script

The latency of the object varies with the FFT length and the vector size. Use the getLatency function to find the latency of a particular configuration. The latency is the number of cycles between the first valid input and the first valid output, assuming that the input is contiguous.

Create a new dsp.HDLFFT object and request the latency.

hdlfft = dsp.HDLFFT('FFTLength',512);
L512 = getLatency(hdlfft)

L512 = 599

Request hypothetical latency information about a similar object with a different FFT length. The properties of the original object do not change.

L256 = getLatency(hdlfft,256)

L256 = 329

N = hdlfft.FFTLength

N = 512

Request hypothetical latency information of a similar object that accepts eight-sample vector input.

L256v8 = getLatency(hdlfft,256,8)

L256v8 = 93

Enable scaling at each stage of the FFT. The latency does not change.

hdlfft.Normalize = true;
L512n = getLatency(hdlfft)

L512n = 599

Request the same output order as the input order. The latency increases because the object must collect the output before reordering.

hdlfft.BitReversedOutput = false;
L512r = getLatency(hdlfft)

L512r = 1078

Algorithms

expand all

Streaming Radix 2^2

The streaming Radix 2^2 architecture implements a low-latency architecture. It saves resources compared to a streaming Radix 2 implementation by factoring and grouping the FFT equation. The architecture has log₄(N) stages. Each stage contains two single-path delay feedback (SDF) butterflies with memory controllers. When you use vector input, each stage operates on fewer input samples, so some stages reduce to a simple butterfly, without SDF.

The first SDF stage is a regular butterfly. The second stage multiplies the outputs of the first stage by –j. To avoid a hardware multiplier, the block swaps the real and imaginary parts of the inputs, and again swaps the imaginary parts of the resulting outputs. Each stage rounds the result of the twiddle factor multiplication to the input word length. The twiddle factors have two integer bits, and the rest of the bits are used for fractional bits. The twiddle factors have the same bit width as the input data, WL. The twiddle factors have two integer bits, and WL-2 fractional bits.

If you enable scaling, the algorithm divides the result of each butterfly stage by 2. Scaling at each stage avoids overflow, keeps the word length the same as the input, and results in an overall scale factor of 1/N. If scaling is disabled, the algorithm avoids overflow by increasing the word length by 1 bit at each stage. The diagram shows the butterflies and internal word lengths of each stage, not including the memory.

Burst Radix 2

The burst Radix 2 architecture implements the FFT by using a single complex butterfly multiplier. The algorithm cannot start until it has stored the entire input frame, and it cannot accept the next frame until computations are complete. The output ready port indicates when the algorithm is ready for new data. The diagram shows the burst architecture, with pipeline registers.

Control Signals

The algorithm processes input data only when the input valid port is 1. Output data is valid only when the output valid port is 1.

When the optional input reset port is 1, the algorithm stops the current calculation and clears all internal states. The algorithm begins new calculations when reset port is 0 and the input valid port starts a new frame.

Timing Diagram

This diagram shows the input and output valid port values for contiguous scalar input data, streaming Radix 2^2 architecture, an FFT length of 1024, and a vector size of 16.

The diagram also shows the optional start and end port values that indicate frame boundaries. If you enable the start port, the start port value pulses for one cycle with the first valid output of the frame. If you enable the end port, the start port value pulses for one cycle with the last valid output of the frame.

If you apply continuous input frames, the output will also be continuous after the initial latency.

The input valid port can be noncontiguous. Data accompanied by an input valid port is processed as it arrives, and the resulting data is stored until a frame is filled. Then the algorithm returns contiguous output samples in a frame of N (FFT length) cycles. This diagram shows noncontiguous input and contiguous output for an FFT length of 512 and a vector size of 16.

When you use the burst architecture, you cannot provide the next frame of input data until memory space is available. The ready port indicates when the algorithm can accept new input data.

Latency

The latency varies with the FFT length and the vector size. Use the getLatency function to find the latency of a particular configuration. The latency is the number of cycles between the first valid input and the first valid output, assuming that the input is contiguous.

When using the burst architecture with a contiguous input, if your design waits for ready to output 0 before de-asserting the input valid, then one extra cycle of data arrives at the input. This data sample is the first sample of the next frame. The algorithm can save one sample while processing the current frame. Due to this one sample advance, the observed latency of the later frames (from input valid to output valid) is one cycle shorter than the reported latency. The latency is measured from the first cycle, when input valid is 1 to the first cycle when output valid is 1. The number of cycles between when ready port is 0 and the output valid port is 1 is always latency – FFTLength.

Performance

This resource and performance data is the synthesis result from the generated HDL targeted to a Xilinx^® Virtex^®-6 (XC6VLX75T-1FF484) FPGA. The examples in the tables have this configuration:

1024 FFT length (default)
Complex multiplication using 4 multipliers, 2 adders
Output scaling enabled
Natural order input, Bit-reversed output
16-bit complex input data
Clock enables minimized (HDL Coder™ parameter)

Performance of the synthesized HDL code varies with your target and synthesis options. For instance, reordering for a natural-order output uses more RAM than the default bit-reversed output, and real input uses less RAM than complex input.

For a scalar input Radix 2^2 configuration, the design achieves 326 MHz clock frequency. The latency is 1116 cycles. The design uses these resources.

Resource	Number Used
LUT	4597
FFS	5353
Xilinx LogiCORE^® DSP48	12
Block RAM (16K)	6

When you vectorize the same Radix 2^2 implementation to process two 16-bit input samples in parallel, the design achieves 316 MHz clock frequency. The latency is 600 cycles. The design uses these resources.

Resource	Number Used
LUT	7653
FFS	9322
Xilinx LogiCORE DSP48	24
Block RAM (16K)	8

The block supports scalar input data only when implementing burst Radix 2 architecture. The burst design achieves 309 MHz clock frequency. The latency is 5811 cycles. The design uses these resources.

Resource	Number Used
LUT	971
FFS	1254
Xilinx LogiCORE DSP48	3
Block RAM (16K)	6

Documentation

dsp.HDLFFT

Description

Creation

Syntax

Description

Properties

Architecture — Hardware implementation 'Streaming Radix 2^2' (default) | 'Burst Radix 2'

ComplexMultiplication — HDL implementation of complex multipliers 'Use 4 multipliers and 2 adders' (default) | 'Use 3 multipliers and 5 adders'

BitReversedOutput — Order of the output data true (default) | false

BitReversedInput — Expected order of the input data false (default) | true

Normalize — Output scaling false (default) | true

FFTLength — Number of data points used for one FFT calculation 1024 (default) | integer power of 2 between 23 and 216

ResetInputPort — Enable reset argument false (default) | true

StartOutputPort — Enable start output argument false (default) | true

EndOutputPort — Enable end output argument false (default) | true

RoundingMethod — Rounding mode used for fixed-point operations 'Floor' (default) | 'Ceiling' | 'Convergent' | 'Nearest' | 'Round' | 'Zero'

Usage

Syntax

Description

Input Arguments

X — Input data scalar or column vector of real or complex values

validIn — Validity of input data logical scalar

resetIn — Reset internal state logical scalar

Output Arguments

Y — Output data scalar or column vector of real or complex values

ready — Memory available for input data logical scalar

startOut — First sample of output frame logical scalar

endOut — Last sample of output frame logical scalar

validOut — Validity of output data logical scalar

Object Functions

Specific to dsp.HDLFFT

Common to All System Objects

Examples

Create FFT for HDL Generation

Create Vector-Input FFT for HDL Generation

Explore Latency of HDL FFT Object

Algorithms

Streaming Radix 2^2

Burst Radix 2

Control Signals

Latency

Performance

See Also

Objects

Blocks

DSP System Toolbox Documentation

Support

`Architecture` — Hardware implementation
`'Streaming Radix 2^2'` (default) | `'Burst Radix 2'`

`ComplexMultiplication` — HDL implementation of complex multipliers
`'Use 4 multipliers and 2 adders'` (default) | `'Use 3 multipliers and 5 adders'`

`BitReversedOutput` — Order of the output data
`true` (default) | `false`

`BitReversedInput` — Expected order of the input data
`false` (default) | `true`

`Normalize` — Output scaling
`false` (default) | `true`

`FFTLength` — Number of data points used for one FFT calculation
1024 (default) | integer power of 2 between 2³ and 2¹⁶

`ResetInputPort` — Enable reset argument
`false` (default) | `true`

`StartOutputPort` — Enable start output argument
`false` (default) | `true`

`EndOutputPort` — Enable end output argument
`false` (default) | `true`

`RoundingMethod` — Rounding mode used for fixed-point operations
`'Floor'` (default) | `'Ceiling'` | `'Convergent'` | `'Nearest'` | `'Round'` | `'Zero'`

`X` — Input data
scalar or column vector of real or complex values

`validIn` — Validity of input data
logical scalar

`resetIn` — Reset internal state
logical scalar

`Y` — Output data
scalar or column vector of real or complex values

`ready` — Memory available for input data
logical scalar

`startOut` — First sample of output frame
logical scalar

`endOut` — Last sample of output frame
logical scalar

`validOut` — Validity of output data
logical scalar