Model Design for AXI4 Master Interface Generation

For designs that require accessing large data sets from an external memory, model your algorithm with a simplified AXI4 Master protocol. When you run the IP Core Generation workflow, HDL Coder™ generates an IP core with AXI4 Master interfaces. The AXI4 Master interface can communicate between your design and the external memory controller IP by using the AXI4 Master protocol. Use the AXI4 Master interface when your:

  • Design targets multi-frame video processing applications. You can store the image data in external memory, such as a DDR3 memory on board, and then read or write the images to your design in a burst fashion for high-speed processing.

  • Algorithm must access memory data in a non-streaming arbitrary pattern.

  • DUT IP core must control other IPs with the AXI4 slave interface in the system. This capability is especially useful in standalone FPGA devices.

Simplified AXI4 Master Protocol - Write Channel

To map the DUT ports to AXI4 Master interfaces, use the simplified AXI4 Master protocol. You do not have to model the actual AXI4 Master protocol and instead you can use the simplified protocol. When you run the IP Core Generation workflow, the generated HDL code contains a wrapper logic that translates between the simplified protocol and the actual AXI4 Master protocol. The simplified protocol requires you to use less protocol signals, eases the handshaking mechanism between valid and ready signals, and supports bursts of arbitrary lengths.

Use the simplified AXI4 Master write protocol for a write transaction and the simplified AXI4 Master read protocol for a read transaction. This figure shows the timing diagram for the signals that you model at the DUT input and output interfaces for an AXI4 Master write transaction.

The DUT waits for wr_ready to become high to initiate a write request. When wr_ready becomes high, the DUT can send out the write request. The write request consists of the Data and Write Master to Slave bus signals. This bus consists of wr_len, wr_addr, and wr_valid. wr_addr specifies the starting address that DUT wants to write to. The wr_len signal corresponds to the number of data elements in this write transaction. Data can be sent as long as wr_valid is high. When wr_ready becomes low, the DUT must stop sending data within one clock cycle, and the Data signal becomes invalid. If the DUT continues to send data after one clock cycle, the data is ignored.

Output Signals

Model the Data and Write Master to Slave bus signals at the DUT output interface.

  • Data: The data that you want to transfer, valid each cycle of the transaction.

  • Write Master to Slave bus that consists of:

    • wr_addr: Starting address of the write transaction that is sampled at the first cycle of the transaction. The address is specified in bytes.

    • wr_len: The number of data values that you want to transfer, sampled at the first cycle of the transaction. The wr_len signal is specified in words. This means each unit of wr_len is a complete data element. For example, when wr_len is 2, and the bit width of data is 128 bit, two 128-bit data elements are written.

    • wr_valid: When this control signal becomes high, it indicates that the Data signal sampled at the output is valid.

Input Signals

Model the Write Slave to Master bus that consists of:

  • wr_complete (optional signal): Control signal that when remains high for one clock cycle indicates that the write transaction has completed. The next burst of data can be sent after wr_complete asserts. The early assertion of wr_complete makes the average latency nearly 3 clock cycles between two bursts, which makes the write operation pipelined and improves the write throughput.

  • wr_ready: This signal corresponds to the back pressure from the slave IP core or external memory. When this control signal goes high, it indicates that data can be sent. When wr_ready is low, the DUT must stop sending data within one clock cycle. You can also use the wr_ready signal to determine whether the DUT can send a second burst signal immediately after the first burst signal has been sent. Multiple burst signals are supported, which means that the wr_ready signal remains high to accept the second burst immediately after the last element of the first burst has been accepted. Using wr_ready to determine when to start the next burst can reduce the average latency between two bursts to less than 3 clock cycles.

  • wr_bvalid (optional signal): Response signal from the slave IP core that you can use for diagnosis purposes. The wr_bvalid signal becomes high after the AXI4 interconnect accepts each burst transaction. If wr_len is greater than 256, the AXI4 Master write module splits the large burst signal into 256-sized bursts. wr_bvalid becomes high for each 256-sized burst.

  • wr_bresp (optional signal): Response signal from the slave IP core that you can use for diagnosis purposes. Use this signal with the wr_bvalid signal.

The AXI4 Master protocol supports a maximum burst size of 256. When you have a large burst of size greater than 256, the AXI Master interface in the generated HDL IP core divides the large burst into multiple smaller bursts with size 256. Therefore, even for large bursts of data, you see an improved write throughput.

Simplified AXI4 Master Protocol - Read Channel

This figure shows the timing diagram for the signals that you model at the DUT input and output interfaces for an AXI4 Master read transaction. These signals include the Data, Read Master to Slave Bus, and Read Slave to Master Bus.

The DUT waits for rd_aready to become high to initiate a read request. When rd_aready is high, the DUT can send out the read request. The read request consists of the rd_addr, rd_len, and rd_avalid signals of the Read Master to Slave bus. The slave IP or the external memory responds to the read request by sending the Data at each clock cycle. The rd_len signal corresponds to the number of data values to read. The DUT can receive Data as long as rd_dvalid is high.

Read Request

To model a read request, at the DUT output interface, model the Read Master to Slave bus that consists of:

  • rd_addr: Starting address for the read transaction that is sampled at the first cycle of the transaction. The address is specified in bytes.

  • rd_len: The number of data values that you want to read, sampled at the first cycle of the transaction. The rd_len signal is specified in words. This means each unit of rd_len is a complete data element. For example, when rd_len is 2, and the bit width of data is 128 bit, two 128-bit data elements are read.

  • rd_avalid: Control signal that specifies whether the read request is valid.

At the DUT input interface, implement the rd_aready signal. This signal is part of the Read Slave to Master bus and indicates when to accept read requests. You can monitor the rd_aready signal to determine whether the DUT can send consecutive burst requests. When rd_aready becomes high, it indicates that the DUT can send a read request in the next clock cycle.

Read Response

At the DUT input interface, model the Data and Read Slave to Master bus signals.

  • Data: The data that is returned from the read request.

  • Read Slave to Master bus that consists of:

    • rd_dvalid: Control signal which indicates that the Data returned from the read request is valid.

    • rd_rvalid (optional signal): response signal from the slave IP core that you can use for diagnosis purposes.

    • rd_rresp (optional signal): Response signal from the slave IP core that indicates the status of the read transaction.

At the DUT output interface, you can optionally implement the rd_dready signal. This signal is part of the Read Master to Slave bus and indicates when the DUT can start accepting data. By default, if you do not map this signal to the AXI4 Master read interface, the generated HDL IP core ties rd_dready to logic high.

Base Address Register Calculation

For IP cores that you generate, HDL Coder includes a base address register to support driver authoring for both the AXI4 Master read and write channels. The base address register is added to the address that is specified by the DUT ADDR port to form the AXI4 Master address. This capability enables the driver to use an addressing mode that programs a fixed register address with the base address of a buffer. The programmed address together with the DUT ADDR port is used to index the buffer. By default, the registers take a value of zero, if you do not use them.

Modeling for AXI4 Master Interfaces

You can model your algorithm with Data and AXI4 Master protocol signals at the DUT ports and then map the signals to AXI4 Master interfaces.

To learn how to model your DUT algorithm for AXI4 Master interface mapping, open this Simulink® model. The DUT Subsystem contains a simple algorithm that reads data from the DDR and writes the data back to a different address in the DDR memory.

Double-click the DUT Subsystem. The DDR_Access_Controller Subsystem models the AXI Master read and write channels and has a Simple Dual Port RAM that calculates the wr_data signal. If you double-click the DDR_Access_Controller Subsystem, you see two Edge Detection Subsystem blocks that generate the two start pulses as input to each MATLAB Function block. One Edge Detection Subsystem and DDR Read Controller MATLAB Function models the read transaction. The other Edge Detection Subsystem and DDR Write Controller MATLAB Function models the write transaction. You can modify this design to model only the write transaction or the read transaction by using one Edge Detection Subsystem and the corresponding MATLAB Function block.

Read Channel

The DDR Read Controller is modeled as a state machine with four states: INIT, IDLE, READ_BURST_START, and DATA_COUNT. The INIT state initializes the read signals and the RAM input signals. When the start signal goes high, the state machine switches to the IDLE state, and then waits for the rd_aready signal to become high. When rd_aready becomes high, the state machine transitions to the READ_BURST_START state and the DUT starts reading data. The state machine then unconditionally switches to the DATA_COUNT state and continues to read data till rd_avalid goes low.

Write Channel

The DDR Write Controller is modeled similar to the Read channel as a state machine with four states : IDLE, WRITE_BURST_START, DATA_COUNT, and ACK_WAIT. The DUT is in the IDLE state and then switches to the WRITE_BURST_START state where it waits for the wr_ready signal. When wr_ready becomes high, the state machine switches to the DATA_COUNT state and starts writing data. The data is valid when wr_valid is high. The DUT continues to write data when wr_ready is high. As wr_ready becomes low, the state machine switches to the ACK_WAIT state and then waits for the ready signal to initiate the next write transaction.

To see the simplified AXI4 Master protocol in effect, simulate the model. If you have DSP System Toolbox™ installed, you can view and analyze the results in the Logic Analyzer.

You can use the IP Core Generation workflow to generate an HDL IP core with the AXI4 Master interface. If you have HDL Verifier™ installed, and you use the Xilinx Zynq ZC706 board, then you can integrate the IP core into the Default System with External DDR3 memory access reference design.

Map Vector Ports to AXI4 Master Interfaces

To integrate your HDL IP core into larger reference designs, and to achieve higher throughput when you use the AXI4 Master port to access external DDR memory, you may want to use larger bit widths on the Data port. The AXI4 Master interface bus supports a maximum bit width of 1024 bits.

Simulink® supports fixed-point data types that have word length of up to 128 bits. To model your DUT ports with word lengths greater than 128 bits, use vector data types. If you use a vector port such that the combined bit width of all the elements in the vector is greater than 1024 bits, the Set Target Interface task displays an error.

For example, in the hdlcoder_axi_master model, to expand the bit width of the axim_rd_data port to 512 bits, change the ddr_data parameter inside the DDR to fi(([40:-1:1]),0,128,0) and then concatenate the 128-bit input four times to generate an output of 512 bits. You can use a Vector Concatenate block to output a combined bit width of 512 bits. To simulate the model, replace the Simple Dual Port RAM block inside the DUT subsystem with a Simple Dual port RAM System.

Concatenating scalar input at data port of hdlcoder_axi_master model to convert 128-bit scalar input to a 512-bit vector input.

You can then map these DUT Data ports to AXI4 Master Read or AXI Master Write ports in the Target platform interface table, generate the HDL IP core, and integrate the IP core into your Vivado® or Qsys reference designs. In the generated HDL code for the DUT IP core, the Data ports are mapped to 512-bit interfaces. Multiple FIFO blocks are generated corresponding to each element of the vector input.

ENTITY DUT_ip IS
  PORT( IPCORE_CLK            :   IN    std_logic;  -- ufix1
        IPCORE_RESETN         :   IN    std_logic;  -- ufix1
        AXI4_Master_Rd_RDATA  :   IN    std_logic_vector(511 DOWNTO 0);  -- ufix256
        
        ...
        ...

        AXI4_Master_Wr_WDATA  :   OUT   std_logic_vector(511 DOWNTO 0);  -- ufix256
        
        ...

        );
END DUT_ip;

This figure illustrates the order in which the vector data is written to and read form.

The endianness of data written to and read from an external memory such as DDR when using vector ports.

In the HDL code for the DUT IP core, you can see how the AXI4_Master_Rd_RDATA and AXI4_master_Wr_WDATA interfaces are mapped to the DUT ports and the order in which data is written to the AXI4 Master interface and then read back.

...
...
--------------------------------------------------------------------
AXI4 Master Read Sequence
--------------------------------------------------------------------
AXI4_Master_Rd_RDATA_0 <= AXI4_Master_Rd_RDATA_unsigned(127 DOWNTO 0);

AXI4_Master_Rd_RDATA_1 <= AXI4_Master_Rd_RDATA_unsigned_1(255 DOWNTO 128);

AXI4_Master_Rd_RDATA_2 <= AXI4_Master_Rd_RDATA_unsigned_7(383 DOWNTO 256);

AXI4_Master_Rd_RDATA_3 <= AXI4_Master_Rd_RDATA_unsigned_7(511 DOWNTO 384);

--------------------------------------------------------------------
AXI4 Master Write Sequence
--------------------------------------------------------------------
AXI4_Master_Wr_WDATA_tmp <= unsigned(AXI4_Master_Wr_WDATA_Vec_3) & 
                            unsigned(AXI4_Master_Wr_WDATA_Vec_2) &
                            unsigned(AXI4_Master_Wr_WDATA_Vec_1) &
                            unsigned(AXI4_Master_Wr_WDATA_Vec_0);


AXI4_Master_Wr_WDATA <= std_logic_vector(AXI4_Master_Wr_WDATA_tmp);

...
...

If you use a nonstandard bit width for the AXI4 Master Data port, the Data port is upgraded to a standard bit width container that has a bigger size. Standard bit widths include 32, 64, 128, 256, 512, and 1024 bits. For example, if you use a vector that has four 35-bit elements, the resulting bit width of 140 bits (35x4) is mapped to a 256-bit AXI4 Master interface. At the Write channel Data port, bits 255 to 141 are padded with zeroes. At the Read channel Data port, bits 255 to 141 are ignored.

zero padding logic in writing data and how they are ignored when reading the data.

Using nonstandard bit widths can have a performance impact because the entire bandwidth of the AXI4 Master interface is not used. To avoid performance hits, use standard AXI bit widths.

Model Designs with Multiple Sample Rates

The HDL Coder software supports designs with multiple sample rates when you run the IP Core Generation workflow. When you map the interface ports to AXI4 Master interfaces, to use multiple sample rates, ensure that the DUT ports that map to these AXI4 interfaces run at the fastest rate of the design after HDL code generation.

To learn more, see Multirate IP Core Generation.

Reference Designs for IP Core Integration

You can integrate the generated HDL IP core with AXI4 Master interfaces into these HDL Coder reference designs:

  • Default System with External DDR3 Memory Access: When your target platform is Xilinx Zynq ZC706 evaluation kit.

  • Default System with External DDR4 Memory Access: When your target platform is Altera Arria10 SoC development kit.

To use these reference designs, you must have HDL Verifier™ installed. This figure shows a high level block diagram of the reference design architecture.

In this architecture, the HDL DUT IP block corresponds to the IP core that is generated from the IP Core Generation workflow. Other blocks in the architecture represent the predefined reference design, that consists of a MATLAB® based JTAG AXI Master IP that is provided by HDL Verifier. After you run the FPGA design on the board, using the JTAG AXI Master IP, you can use the input data in MATLAB to initialize the onboard DDR3 external memory. The HDL DUT IP core reads the input data from the external memory via the AXI4 Master interface. The IP core then performs the algorithm computation and writes the result to DDR3 memory via the AXI4 Master interface. The JTAG AXI Master IP can read the result from DDR3 memory and then verify the result in MATLAB.

Using the addAXI4MasterInterface method of the hdlcoder.ReferenceDesign class, you can integrate the IP core with AXI4 Master Interface into your own custom reference design.

Restrictions

  • Synthesis tool: Must be Xilinx Vivado or Altera QUARTUS II. Xilinx ISE is not supported.

  • Target workflow: Use the IP Core Generation workflow. To run the workflow, open the HDL Workflow Advisor from your DUT algorithm in Simulink. MATLAB to HDL workflow is not supported.

  • Processor/FPGA synchronization: Must be Free running mode.

Related Examples

More About