Convolution-Capp

ILP_Rate

ILP_rate = ILP_16k / ILP_32

ILP_32 = 32/dependency depth

dependency depth:

The SIZE_FILTER_1 loop needs 4 instructions to be initialized + 3 instructions to calculate s(used in indexing output image array)

The SIZE_FILTER_3 loop cannot run in parallel with SIZE_FILTER_2 and therefore can only begin executing once the first iteration of the SIZE_FILTER_2 loop completes.

storing a value at temp_filter[n] requires 5 instructions while storing a value at temp_Image[n] requires 4 instructions. In addition 4 instructions are needed to set up the SIZE_FILTER_2 loop.

calculating sum requires 6 instructions + 4 to set up the SIZE_FILTER_3 loop

Finally, storing the first result in the output image array requires an additional 3 instructions

Therefore, the longest path is temp_filter[n] loading + calculating sum + storing output image

1. initializing n (1reg op)

2. loading temp_filter[n] (2arith add + 1arith mul + 2fp address trans(base pointer + offset) + 1load + 1store)

3. calculating sum (1fp mul + 1fp add)

4. storing output image (1store)

dependency depth = 1 + 2 + 1 + 2 + 1 + 1 + 1 + 1 + 1 = 11

Therefore,

ILP_32 = 32/11 = 2.909

The two inner SIZE_FILTER loops can be combined into a single loop with 25 operations per iteration of the loop.

These 25 instructions will be executed for 512*512*5 iterations

Considering the compiler unrolls 4 loops, there would be 256*256*5 iterations of 100 instructions

A window of size 16384 will fit 163 loops, which can be executed in a pipeline

This leads to 163 + 11 = 174 cycles required for execution

ILP_16k = 16384 / 174

= 94.16

ILP rate = 94.16/2.909 = 32.368