IFFT HDL Optimized

Computes inverse-fast-fourier-transform and generates optimized HDL code

Library:
DSP System Toolbox HDL Support / Transforms

Description

The IFFT HDL Optimized block provides two architectures that implement the algorithm for FPGA and ASIC applications. You can select an architecture that optimizes for either throughput or area.

Streaming Radix 2^2 — Use this architecture for high-throughput applications. This architecture supports scalar or vector input data. You can achieve giga-sample-per-second (GSPS) throughput using vector input.
Burst Radix 2 — Use this architecture for a minimum resource implementation, especially with large fast-fourier-transform (FFT) sizes. Your system must be able to tolerate bursty data and higher latency. This architecture supports only scalar input data.

The IFFT HDL Optimized accepts real or complex data, provides hardware-friendly control signals, and optional output frame control signals.

Ports

Input

expand all

`data` — Input data
scalar or column vector of real or complex values

Input data, specified as a scalar or column vector of real or complex values. Only the Streaming Radix 2^2 architecture supports a vector input. The vector size must be a power of 2, in the range from 1 to 64, and less than or equal to the FFT length.

double and single data types are supported for simulation, but not for HDL code generation.

`valid` — Indicates valid input data
scalar

This port indicates if the input data is valid. When the input valid is true (1), the block captures the value on the input data port. When the input valid is false (0), the block ignores the input data samples.

Data Types: Boolean

`reset` — Reset control signal
scalar

When reset is true (1), the block stops the current calculation and clears all internal states. The block starts a new frame when the reset is false (0) and the input valid is true (1).

Dependencies

To enable this port, select the Enable reset input port parameter.

Data Types: Boolean

Output

expand all

`data` — Frequency channel output data
scalar or column vector of real or complex values

When input is fixed-point data type and scaling is enabled, the output data type is the same as the input data type. When the input is integer type and scaling is enabled, the output is fixed-point type with the same word length as the input integer. The output order is bit-reversed by default. If scaling is disabled, the output word length increases to avoid overflow. Only the Streaming Radix 2^2 architecture supports vector input and output. For more information, see Divide butterfly outputs by two parameter.

Data Types: fixed point | double | single
Complex Number Support: Yes

`valid` — Indicates valid output data
scalar

This port indicates that output data is valid. When valid is true (1), the block returns valid data on the output data port. When valid is false (0), the values on output data port are not valid.

Data Types: Boolean

`ready` — Indicates block is ready
scalar

This port indicates that the block is ready for a new input sample. When ready is true (1), the block accepts input data in the next time step, and when ready is false (0), the block ignores the input data in the next time step.

Dependencies

The port appears on the block when you set the Architecture parameter to Burst Radix 2.

Data Types: Boolean

`start` — Indicates first valid cycle of output data
scalar

When you enable this port, the block sets the start output to true (1) during the first valid cycle of a frame of output data.

Dependencies

To enable this port, select the Enable start output port parameter.

Data Types: Boolean

`end` — Indicates last valid cycle of output data
scalar

When you enable this port, the block sets the end output to true (1) during the last valid cycle of a frame of output data.

Dependencies

To enable this port, select the Enable end output port parameter.

Data Types: Boolean

Parameters

expand all

Main

`FFT length` — Number of data points used for one FFT calculation
`1024` (default)

This parameter specifies the number of data points used for one inverse-fast-fourier-transform (IFFT) calculation. For HDL code generation, the FFT length must be a power of 2 between 2³ and 2¹⁶.

`Architecture` — Architecture type
`Streaming Radix 2^2` (default) | `Burst Radix 2`

This parameter specifies the type of architecture.

Streaming Radix 2^2 — Select this value to specify low-latency architecture. This architecture type supports GSPS throughput when using vector input.
Burst Radix 2 — Select this value to specify minimum resource architecture. This architecture type does not support vector input.

For HDL code generation, the FFT length must be a power of 2 between 2³ and 2¹⁶.

For more details about these architectures, see Algorithms.

`Complex Multiplication` — HDL implementation
`Use 4 multipliers and 2 adders` (default) | `Use 3 multipliers and 5 adders`

This parameter specifies the complex multiplier type for HDL implementation. Each multiplication is implemented either with Use 4 multipliers and 2 adders or with Use 3 multipliers and 5 adders. The implementation speed depends on the synthesis tool and target device that you use.

`Output in bit-reversed order` — Order of output data
on (default) | off

This parameter returns output elements in bit-reversed order.

When you select this parameter, the output elements are bit-reversed. To return output elements in linear order, clear this parameter.

The IFFT algorithm calculates output in the reverse order to the input. If you specify the output to be in the same order as the input, the algorithm performs an extra reversal operation. For more information, see Linear and Bit-Reversed Output Order.

`Input in bit-reversed order` — Expected order of input data
off (default) | on

When you select this parameter, the block expects input data in bit-reversed order. By default, the check box is cleared and the input is expected in linear order.

`Divide butterfly outputs by two` — FFT scaling
on (default) | off

When you select this parameter, the block implements an overall 1/N scale factor by dividing the output of each butterfly multiplication by two. This adjustment keeps the output of the IFFT in the same amplitude range as its input. If you disable scaling, the block avoids overflow by increasing the word length by 1 bit after each butterfly multiplication. The bit increase is the same for both architectures.

Data Types

`Rounding Method` — Rounding mode for internal fixed-point calculations
`Floor` (default) | `Ceiling` | `Convergent` | `Nearest` | `Round` | `Zero`

This parameter allows you to select the type of rounding mode for internal fixed-point calculations. For more information about rounding modes, see rounding method. When the input is any integer or fixed-point data type, the IFFT algorithm uses fixed-point arithmetic for internal calculations. This option does not apply when the input is single or double type. Rounding applies to twiddle factor multiplication and scaling operations.

Control Ports

`Enable reset input port` — Optional reset signal
off (default) | on

This parameter enables a reset input port. When you select this parameter, the input reset port appears on the block icon.

`Enable start output port` — Optional control signal indicating start of data
off (default) | on

This parameter enables a port that indicates the start of output data. When you select this parameter, the output start port appears on the block icon.

`Enable end output port` — Optional control signal indicating end of data
off (default) | on

This parameter enables a port that indicates the end of output data. When you select this parameter, the output end port appears on the block icon.

Model Examples

Implement FFT for FPGA Using FFT HDL Optimized Block

Use the FFT HDL Optimized block to implement a FFT for hardware.

Automatic Delay Matching for the Latency of FFT HDL Optimized Block

Example of delay-matching alongside the FFT HDL Optimized block.

Algorithms

expand all

Streaming Radix 2^2

The streaming Radix 2^2 architecture implements a low-latency architecture. It saves resources compared to a streaming Radix 2 implementation by factoring and grouping the FFT equation. The architecture has log₄(N) stages. Each stage contains two single-path delay feedback (SDF) butterflies with memory controllers. When you use vector input, each stage operates on fewer input samples, so some stages reduce to a simple butterfly, without SDF.

The first SDF stage is a regular butterfly. The second stage multiplies the outputs of the first stage by –j. To avoid a hardware multiplier, the block swaps the real and imaginary parts of the inputs, and again swaps the imaginary parts of the resulting outputs. Each stage rounds the result of the twiddle factor multiplication to the input word length. The twiddle factors have two integer bits, and the rest of the bits are used for fractional bits. The twiddle factors have the same bit width as the input data, WL. The twiddle factors have two integer bits, and WL-2 fractional bits.

If you enable scaling, the algorithm divides the result of each butterfly stage by 2. Scaling at each stage avoids overflow, keeps the word length the same as the input, and results in an overall scale factor of 1/N. If scaling is disabled, the algorithm avoids overflow by increasing the word length by 1 bit at each stage. The diagram shows the butterflies and internal word lengths of each stage, not including the memory.

Burst Radix 2

The burst Radix 2 architecture implements the FFT by using a single complex butterfly multiplier. The algorithm cannot start until it has stored the entire input frame, and it cannot accept the next frame until computations are complete. The output ready port indicates when the algorithm is ready for new data. The diagram shows the burst architecture, with pipeline registers.

Control Signals

The algorithm processes input data only when the input valid port is 1. Output data is valid only when the output valid port is 1.

When the optional input reset port is 1, the algorithm stops the current calculation and clears all internal states. The algorithm begins new calculations when reset port is 0 and the input valid port starts a new frame.

Timing Diagram

This diagram shows the input and output valid port values for contiguous scalar input data, streaming Radix 2^2 architecture, an FFT length of 1024, and a vector size of 16.

The diagram also shows the optional start and end port values that indicate frame boundaries. If you enable the start port, the start port value pulses for one cycle with the first valid output of the frame. If you enable the end port, the start port value pulses for one cycle with the last valid output of the frame.

If you apply continuous input frames, the output will also be continuous after the initial latency.

The input valid port can be noncontiguous. Data accompanied by an input valid port is processed as it arrives, and the resulting data is stored until a frame is filled. Then the algorithm returns contiguous output samples in a frame of N (FFT length) cycles. This diagram shows noncontiguous input and contiguous output for an FFT length of 512 and a vector size of 16.

When you use the burst architecture, you cannot provide the next frame of input data until memory space is available. The ready port indicates when the algorithm can accept new input data.

Latency

The latency varies with the FFT length and input vector size. After you update the model, the block icon displays the latency. The displayed latency is the number of cycles between the first valid input and the first valid output, assuming the input is contiguous. To obtain this latency programmatically, see Automatic Delay Matching for the Latency of FFT HDL Optimized Block.

When using the burst architecture with a contiguous input, if your design waits for ready to output 0 before de-asserting the input valid, then one extra cycle of data arrives at the input. This data sample is the first sample of the next frame. The algorithm can save one sample while processing the current frame. Due to this one sample advance, the observed latency of the later frames (from input valid to output valid) is one cycle shorter than the reported latency. The latency is measured from the first cycle, when input valid is 1 to the first cycle when output valid is 1. The number of cycles between when ready port is 0 and the output valid port is 1 is always latency – FFTLength.

Performance

This resource and performance data is the synthesis result from the generated HDL targeted to a Xilinx^® Virtex^®-6 (XC6VLX75T-1FF484) FPGA. The examples in the tables have this configuration:

1024 FFT length (default)
Complex multiplication using 4 multipliers, 2 adders
Output scaling enabled
Natural order input, Bit-reversed output
16-bit complex input data
Clock enables minimized (HDL Coder™ parameter)

Performance of the synthesized HDL code varies with your target and synthesis options. For instance, reordering for a natural-order output uses more RAM than the default bit-reversed output, and real input uses less RAM than complex input.

For a scalar input Radix 2^2 configuration, the design achieves 326 MHz clock frequency. The latency is 1116 cycles. The design uses these resources.

Resource	Number Used
LUT	4597
FFS	5353
Xilinx LogiCORE^® DSP48	12
Block RAM (16K)	6

When you vectorize the same Radix 2^2 implementation to process two 16-bit input samples in parallel, the design achieves 316 MHz clock frequency. The latency is 600 cycles. The design uses these resources.

Resource	Number Used
LUT	7653
FFS	9322
Xilinx LogiCORE DSP48	24
Block RAM (16K)	8

The block supports scalar input data only when implementing burst Radix 2 architecture. The burst design achieves 309 MHz clock frequency. The latency is 5811 cycles. The design uses these resources.

Resource	Number Used
LUT	971
FFS	1254
Xilinx LogiCORE DSP48	3
Block RAM (16K)	6

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.

This block supports C/C++ code generation for Simulink^® accelerator and rapid accelerator modes and for DPI component generation.

HDL Code Generation
Generate Verilog and VHDL code for FPGA and ASIC designs using HDL Coder™.

HDL Coder provides additional configuration options that affect HDL implementation and synthesized logic.

HDL Architecture

This block has a single, default HDL architecture.

HDL Block Properties

ConstrainedOutputPipeline	Number of registers to place at the outputs by moving existing delays within your design. Distributed pipelining does not redistribute these registers. The default is `0`. For more details, see ConstrainedOutputPipeline (HDL Coder).
InputPipeline	Number of input pipeline stages to insert in the generated code. Distributed pipelining and constrained output pipelining can move these registers. The default is `0`. For more details, see InputPipeline (HDL Coder).
OutputPipeline	Number of output pipeline stages to insert in the generated code. Distributed pipelining and constrained output pipelining can move these registers. The default is `0`. For more details, see OutputPipeline (HDL Coder).

Restrictions

If you use the IFFT HDL Optimized block with the State Control (HDL Coder) block inside an Enabled Subsystem (Simulink), the optional reset port is not supported. If you enable the reset port on the IFFT HDL Optimized block in such a subsystem, the model will error on Update Diagram.

Documentation

IFFT HDL Optimized

Description

Ports

Input

data — Input data scalar or column vector of real or complex values

valid — Indicates valid input data scalar

reset — Reset control signal scalar

Dependencies

Output

data — Frequency channel output data scalar or column vector of real or complex values

valid — Indicates valid output data scalar

ready — Indicates block is ready scalar

Dependencies

start — Indicates first valid cycle of output data scalar

Dependencies

end — Indicates last valid cycle of output data scalar

Dependencies

Parameters

Main

FFT length — Number of data points used for one FFT calculation 1024 (default)

Architecture — Architecture type Streaming Radix 2^2 (default) | Burst Radix 2

Complex Multiplication — HDL implementation Use 4 multipliers and 2 adders (default) | Use 3 multipliers and 5 adders

Output in bit-reversed order — Order of output data on (default) | off

Input in bit-reversed order — Expected order of input data off (default) | on

Divide butterfly outputs by two — FFT scaling on (default) | off

Data Types

Rounding Method — Rounding mode for internal fixed-point calculations Floor (default) | Ceiling | Convergent | Nearest | Round | Zero

Control Ports

Enable reset input port — Optional reset signal off (default) | on

Enable start output port — Optional control signal indicating start of data off (default) | on

Enable end output port — Optional control signal indicating end of data off (default) | on

Model Examples

Implement FFT for FPGA Using FFT HDL Optimized Block

Automatic Delay Matching for the Latency of FFT HDL Optimized Block

Algorithms

Streaming Radix 2^2

Burst Radix 2

Control Signals

Latency

Performance

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using Simulink® Coder™.

HDL Code Generation Generate Verilog and VHDL code for FPGA and ASIC designs using HDL Coder™.

See Also

Blocks

Objects

DSP System Toolbox Documentation

Support

`data` — Input data
scalar or column vector of real or complex values

`valid` — Indicates valid input data
scalar

`reset` — Reset control signal
scalar

`data` — Frequency channel output data
scalar or column vector of real or complex values

`valid` — Indicates valid output data
scalar

`ready` — Indicates block is ready
scalar

`start` — Indicates first valid cycle of output data
scalar

`end` — Indicates last valid cycle of output data
scalar

`FFT length` — Number of data points used for one FFT calculation
`1024` (default)

`Architecture` — Architecture type
`Streaming Radix 2^2` (default) | `Burst Radix 2`

`Complex Multiplication` — HDL implementation
`Use 4 multipliers and 2 adders` (default) | `Use 3 multipliers and 5 adders`

`Output in bit-reversed order` — Order of output data
on (default) | off

`Input in bit-reversed order` — Expected order of input data
off (default) | on

`Divide butterfly outputs by two` — FFT scaling
on (default) | off

`Rounding Method` — Rounding mode for internal fixed-point calculations
`Floor` (default) | `Ceiling` | `Convergent` | `Nearest` | `Round` | `Zero`

`Enable reset input port` — Optional reset signal
off (default) | on

`Enable start output port` — Optional control signal indicating start of data
off (default) | on

`Enable end output port` — Optional control signal indicating end of data
off (default) | on

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.

HDL Code Generation
Generate Verilog and VHDL code for FPGA and ASIC designs using HDL Coder™.