Open this Example

Analyze Memory Bandwidth Using Traffic Generators

This example demonstrates how to analyze memory bandwidth for an SoC application. In memory-intensive hardware designs, you may have multiple masters accessing a common DDR memory. In such cases, it is important to analyze the dynamic requirement of all memory masters to guide algorithm design and hardware board requirement for deployment. You can simulate the memory traffic using Memory traffic generators, analyze the bandwidth usage and verify it on the hardware.

Contents

Supported Hardware Platforms

Design Task

Consider an application performing HD video processing in FPGA on real- time input and output. This application requires four memory consumers vying for DDR access simultaneously. Memory master 1 writes incoming video frames to memory and Memory master 4 reads video frames out of memory and connect to output display. Memory master 2 reads the data from memory for processing in FPGA and memory master 3 writes the data back to memory.

Each master operates on HD video with the following characteristics:

Each master requires following minimum memory bandwidth to get the frame rate of 60 FPS.

Assume the memory controller characteristics are as follows:

Design Using SoC Blockset

Create a model using Memory Controller and Memory Traffic Generator blocks to model four memory masters.

Memory Controller: Set the memory controller parameters in Configuration Parameters > Hardware Implementation > Target Hardware Resources .Under FPGA design (mem controllers) tab, set the clock frequency to 200 MHz and data width to 32. Under FPGA design (debug) tab, select 'Include AXI interconnect monitor'.

Memory Traffic Generators 1 & 4: Memory traffic characteristics for Master1 and Master4 are same as they represent streaming of video frames to and from memory. Set the memory traffic characteristics for masters 1 and 4 as follows:

Burst inter access time: Frame Period/Number of Burst requests = 16.67e-3/8100 = 20.58e-7 sec. As a constant data traffic, the data is continuously received at a constant rate. Set the burst times as below:

Update the Memory Traffic Generator1 and Memory Traffic Generator4 block masks with above values. Set the request type for Memory Traffic Generator1 with Writer and Memory Traffic Generator4 with Reader. Clear the 'Wait for burst done' option in both the block masks as these masters represent the masters with continuous traffic, such as HDMI Camera and display.

Memory Traffic Generators 2 & 3: Memory Traffic Generator2 represents reader for FPGA Algorithm and Memory Traffic Generator3 represents writer from FPGA Algorithm. Set the memory traffic characteristics for masters 2 and 3 as follows:

Burst inter access time: (Burst Length + 10)/Clock period = 6.9e-7(0.69us). To allow some randomness in burst times for read and write request of data, due to variation in demands of algorithm, set the burst times as below:

Simulate

Run the model. After completion of simulation, open the Memory Controller block and click on View performance plots under Performance tab. Select all the masters under Bandwidth tab and click Create Plot. You can notice that all masters roughly achieved a bandwidth of 190 MBps and did not meet the required 248 MBps. It is also observed by the warnings in the diagnostic viewer.

To meet the required bandwidth, modify the data width of controller from 32 to 64 in configuration parameter settings under Target Hardware Resources. This requires changing the Memory Traffic Generator settings accordingly as follows:

Burst inter access time for Memory Traffic Generators 1 & 4: Frame Period/Number of Burst requests = 16.67e-3/4050 = 41.16e-7 sec. Set the burst times as below:

There is no change in First burst time and Random time between the bursts for Memory Traffic Generators 2 and 3, since they are determined based on algorithm needs.

The new parameter settings are as follows:

Simulate the model and open the Bandwidth plot from Memory Controller as mentioned earlier. Notice that Memory bandwidth achieved by Memory Traffic Generator 1 and 4 is 248 MBps. The memory bandwidth from Generator 2 and 3 is around 500 MBps. This meets the design requirement as all the masters are able to meet the real-time requirement of 248 MHz. Observe that there are no warnings on the diagnostic viewer as burst requests are not dropped.

Implement and Run on Hardware

SoC Blockset Support Package for Xilinx Devices is required for this section.

To implement the model on a supported FPGA board, use the SoC Builder application. By default, the model is implemented on Xilinx® Zynq® ZC706 evaluation kit as it is configured with that board.

AXI Traffic Generator (ATG), the hardware IP Core for Memory Traffic Generator block does not support random burst inter access times and it differentiates Reader and Writer masters in arbitration policy unlike the Memory Traffic Generator block for simulation. Therefore, before implementing on hardware, modify the Memory block settings as follows:

Open SoC Builder from the Tools menu and follow these steps:

The FPGA synthesis may take more than 30 minutes to complete. To save time, you may want to use the provided pre-generated bitstream by following steps:

>> copyfile(fullfile(matlabroot,'toolbox','soc','socexamples','bitstreams','soc_memory_traffic_generator-zc706.bit'), './soc_prj');

To run this example, copy the example test bench to your project folder.

>> copyfile(fullfile(matlabroot,'toolbox','soc','socexamples','soc_memory_traffic_generator_aximaster.m'), './soc_prj','f');

The testbench configures the generated hardware ATG IP cores for Memory Traffic Generators. To run on hardware, increase the number of burst requests by 100 times since it uses MATLAB® as AXI Master IP to get the samples back to MATLAB®, which involves substantial delay in accessing hardware. Load soc_memory_traffic_generator_zc706_aximaster.mat file and increase the number of burst requests for all the masters in ATG configuration to 100 times. Save the .mat file requests in ATG configuration.

Enter the following command to run the test bench soc_memory_traffic_generator_aximaster.

>> soc_memory_traffic_generator_aximaster

After running the test bench, the following output is generated showing the memory traffic. All masters passing the bandwidth requirements.

Implementation on Xilinx® Kintex® 7 KC705 development board: To implement the model on KC705 development board, you must first configure the model to Xilinx® Kintex® 7 KC705 development board and set the following example parameters. Open Model Configuration Parameters, navigate to Hardware Implementation tab and perform the following:

Next, open SoC Builder and follow the steps as previously stated for Xilinx® Zynq® ZC706 above. Modify the copyfile command to match Kintex® 7 KC705 development board bitstream as below.

>> copyfile(fullfile(matlabroot,'toolbox','soc','socexamples','bitstreams','soc_memory_traffic_generator-kc705.bit'), './soc_prj');

In summary, you simulated the memory traffic for a prospective design before designing the algorithms. You analyzed memory bandwidth and modified memory parameters to meet the design requirement. You verified the results on hardware.