HDL Coder™ native floating-point technology can generate HDL code from your floating-point design. Native floating-point operators have a latency. When you generate HDL code, the code generator figures out this latency and adds matching delays to balance parallel paths.
Open the hdlcoder_nfp_delay_allocation
Simulink™ model. The model uses single
data types and computes the square root. The model has a parallel path to illustrate how the code generator balances delays.
load_system('hdlcoder_nfp_delay_allocation') open_system('hdlcoder_nfp_delay_allocation/DUT')
To generate HDL code:
Right-click the DUT
Subsystem and select HDL Code > Generate HDL for Subsystem.
To see the generated model after HDL code generation, at the command line, enter gm_hdlcoder_nfp_delay_allocation
.
The NFP Sqrt
block is the floating-point operator corresponding to the Sqrt block in your model, and has a latency of 28
. The code generator determines this latency and adds a matching delay of length 28
in the parallel path. To see the latency of the square root operation, double-click the NFP Sqrt
block. The Delay length of the Sqrt_pd1
block corresponds to the operator latency.
You can customize the latency of your design. Use custom latency settings to design for trade-offs between latency and throughput. You can then optimize your design implementation on the target FPGA device for area and speed. Customize the latency by using:
Latency Strategy setting: Specify whether to map your entire Simulink™ model or individual blocks in your model to maximum, minimum, or zero latency of the floating-point operator.
Custom Latency: You can specify a custom latency for certain blocks that you use in your Simulink™ model. The custom latency setting can take values from zero to the maximum latency of the floating-point operator.
Oversampling factor: Increasing the Oversampling factor operates the design at a faster clock rate and absorbs the clock-rate pipelines with the latency of the floating-point operator.
Delay blocks in the model: If your Simulink model has a latency, HDL Coder™ can absorb some or all of the latency with the native floating-point implementation.
You can specify the latency strategy setting for an entire model or for individual blocks in your model.
To specify this setting for a model:
In the hdlcoder_nfp_delay_allocation
model, right-click the DUT
Subsystem and select HDL Code > HDL Coder Properties.
On the HDL Code Generation > Global Settings > Floating Point Target tab, for Library, select Native Floating Point, and then for Latency Strategy, select MAX, MIN, or ZERO.
To specify this setting from the command line:
Create a hdlcoder.FloatingPointTargetConfig
object for native floating point by using the hdlcoder.createFloatingPointTargetConfig
function.
nfpconfig = hdlcoder.createFloatingPointTargetConfig('NATIVEFLOATINGPOINT'); hdlset_param('hdlcoder_nfp_delay_allocation', 'FloatingPointTargetConfiguration', nfpconfig);
Specify the latency strategy by using the LatencyStrategy property
of the nfpconfig
object.
nfpconfig.LibrarySettings.LatencyStrategy = 'MAX'
nfpconfig = FloatingPointTargetConfig with properties: Library: 'NativeFloatingPoint' LibrarySettings: [1x1 fpconfig.NFPLatencyDrivenMode] IPConfig: [1x1 hdlcoder.FloatingPointTargetConfig.IPConfig]
To see the latency information, generate HDL code and then open the generated model. To open the generated model, enter the command gm_hdlcoder_nfp_delay_allocation
.
For blocks in your Simulink™ model, you can selectively customize the latency strategy. By default, the blocks inherit the latency strategy setting you specify for the model. For certain blocks, you can specify a custom latency value that is between zero and the maximum latency of the floating-point operator.
By specifying a custom latency, you can customize your design for trade-offs between:
Clock frequency and power consumption: A higher latency value increases the maximum clock frequency (Fmax) that you can achieve, which increases the dynamic power consumption.
Oversampling factor and sampling frequency: A combination of higher latency value and higher oversampling factor increases the Fmax that you can achieve but reduces the sampling frequency.
To learn more about this setting and how to specify the latency strategy for a block, see LatencyStrategy.
For example, if you have an Add block in the parallel path in your model, you can specify a custom latency value of 2
for the Add block by entering these commands.
load_system('hdlcoder_nfp_delay_allocation_custom') open_system('hdlcoder_nfp_delay_allocation_custom') hdlset_param('hdlcoder_nfp_delay_allocation_custom/DUT/Add','LatencyStrategy','Custom') hdlset_param('hdlcoder_nfp_delay_allocation_custom/DUT/Add','NFPCustomLatency',2)
To see the latency information, generate HDL code and then open the generated model. To open the generated model, enter the command gm_hdlcoder_nfp_delay_allocation_custom
. In the generated model, you see that the NFP Add
block has a latency of 2
.
When you design the blocks in your Simulink™ model at the data rate, specify an Oversampling factor greater than one. The Oversampling factor inserts pipeline registers at a faster clock rate, which improves clock frequency and reduces area usage. To learn more about clock-rate pipelining, see Clock-Rate Pipelining.
To see the effect of Oversampling factor on the model, in the hdlcoder_nfp_delay_allocation
model:
Add a Delay block with Delay length 1
at the output of the Sqrt block.
Right-click the DUT and select HDL Code > HDL Coder Properties.
On the HDL Code Generation > Global Settings pane, enter a value of 40
for Oversampling factor.
After HDL code generation, the generated model shows the NFP Sqrt
block operating at a clock rate that is 40 times faster than the Sqrt block in your model. The NFP Sqrt
block absorbed the Delay block in your Simulink™ model. The Delay block now operates at the clock rate. This implementation saves area by absorbing the additional latency, and improves timing by operating at the faster clock rate.
If your Simulink™ model has a Delay block with sufficient Delay length adjacent to an operator, HDL Coder™ absorbs the delays as part of the operator latency.
Note: To absorb delays, make sure that you group the delays adjacent to the block.
If the Delay length is equal to the latency of the floating-point operator, HDL Coder™ absorbs the delays and does not introduce any additional latency.
In the hdlcoder_nfp_delay_allocation
model:
Double-click the Delay block at the output of the Sqrt block and change the Delay length to 28
.
Generate HDL code for the DUT
Subsystem.
After HDL code generation, at the command line, enter gm_hdlcoder_nfp_delay_allocation
to open the generated model.
In the generated model, you see that the NFP Sqrt
block absorbs the Delay block adjacent to the Sqrt block in your original model. This delay absorption occurs because the operator latency is equal to the Delay length. The code generator therefore avoids the additional latency in your model.
If the Delay length is less than the operator latency, HDL Coder™ absorbs the available delays and balances parallel paths by adding matching delays.
In the hdlcoder_nfp_delay_allocation
model:
Double-click the Delay block at the output of the Sqrt block and change the Delay length to 21
.
Generate HDL code for the DUT
Subsystem.
After HDL code generation, at the command line, enter gm_hdlcoder_nfp_delay_allocation
to open the generated model.
You see that the NFP Sqrt
block absorbed a Delay of length 21
and added a matching delay of length 7
in the parallel path because the square root operation requires 28
delays.
If the delay length is greater than the operator latency, the code generator absorbs a certain number of delays equal to the latency and the excess delays appear outside the operator.
In the hdlcoder_nfp_delay_allocation
model:
Double-click the Delay block at the output of the Sqrt block and change the Delay length to 34
.
Generate HDL code for the DUT
Subsystem.
After HDL code generation, at the command-line, enter gm_hdlcoder_nfp_delay_allocation
to open the generated model.
The NFP Sqrt
block absorbed 28 delays because the square root operation has a latency of 28
. The excess latency of 6
is outside the operator.