Convolution-Capp
ILP_Rate
ILP_rate = ILP_16k / ILP_32
ILP_32 = 32/dependency depth
dependency depth:
The SIZE_FILTER_1 loop needs 4 instructions to be initialized + 3 instructions to calculate s(used in indexing output image array)
The SIZE_FILTER_3 loop cannot run in parallel with SIZE_FILTER_2 and therefore can only begin executing once the first iteration of the SIZE_FILTER_2 loop completes.
storing a value at temp_filter[n] requires 5 instructions while storing a value at temp_Image[n] requires 4 instructions. In addition 4 instructions are needed to set up the SIZE_FILTER_2 loop.
calculating sum requires 6 instructions + 4 to set up the SIZE_FILTER_3 loop
Finally, storing the first result in the output image array requires an additional 3 instructions
Therefore, the longest path is temp_filter[n] loading + calculating sum + storing output image
1. initializing n (1reg op)
2. loading temp_filter[n] (2arith add + 1arith mul + 2fp address trans(base pointer + offset) + 1load + 1store)
3. calculating sum (1fp mul + 1fp add)
4. storing output image (1store)
dependency depth = 1 + 2 + 1 + 2 + 1 + 1 + 1 + 1 + 1 = 11
Therefore,
ILP_32 = 32/11 = 2.909
The two inner SIZE_FILTER loops can be combined into a single loop with 25 operations per iteration of the loop.
These 25 instructions will be executed for 512*512*5 iterations
Considering the compiler unrolls 4 loops, there would be 256*256*5 iterations of 100 instructions
A window of size 16384 will fit 163 loops, which can be executed in a pipeline
This leads to 163 + 11 = 174 cycles required for execution
ILP_16k = 16384 / 174
= 94.16
ILP rate = 94.16/2.909 = 32.368