Lbdiv256:
Local branch divergence is a measure of how branchy a region of code is, i.e. how
much a region of code suffers from branch divergence.
Each branch operation evaluates to either being taken or not.
A branch operation that consistently makes the same branch decision is said to have
low divergence while one that frequently switched between taking and not taking the branch has high divergence.
Lbdiv256 tracks these decisions in branch decision windows of size 256 and this
distinction (low divergence vs high divergence) is made based on these windows.
Classifying Lbdiv256:
Lbdiv256 can be classified into low, medium or high as follows:
Bucket | Condition |
---|---|
Low |
The region of code contains no branch conditions.
OR
The region of code contains highly repeated branch operations that
make the same branch decision in almost all iterations.
|
Medium | The region of code contains a mix of branch operations that change decisions frequently (within every 256 decisions) and others that make the same decision in almost all iterations. |
High | The region of code contains a highly repeated branch operation that changes its decision frequently, i.e. within every 256 decisions. |
Example 1:
Consider the following code:
for (i=0; i<1000; i++){ if(i%2 == 0){ result[i] = arrayOne[i] + arrayTwo[i]; } if(i == 999) print(“This is the last loop iteration\n”); }This region contains two branch conditions: (i%2 == 0) & (i == 999). It is quite obvious in this case that the first condition (i%2 == 0) changes its decision (taking the branch or not) frequently while the second condition (i == 999) makes the same decision (not branching) for almost all iterations of this loop. Therefore, a good estimate of Lbdiv256 is mid.
Example 2:
Let's look at another example:
for (i=0; i<1000; i++){ if(i%2 == 0){ for(j=0; j<10; j++){ if(visited[i][j] == true){ result[i] = arrayOne[i] + arrayTwo[i]; visited[i][j] = true; } } } }This region of code contains a second nested loop and branch expression. The innermost branch condition (visited[i][j] == true) is executed 10 times more than the outer branch (i%2 == 0). The inner branch is therefore highly repeated and estimating its divergence should be enough.