Note on Stride:
Memory stride is the distance between memory accesses and is measured as:
Local stride: This is the memory stride between two memory accesses for the same memory reference.
Global stride: This is the memory stride between memory accesses for consecutive memory references.
Consider the following example:
for (i=0; i<1000; i++){ for(j=0; j<10; j++){ sum += arrayOne[i] + arrayTwo[j]; } result[i] = sum; }The memory stride between consecutive memory accesses for the arrayOne memory reference is its local stride, i.e. (starting address of arrayOne[50] - ending address of arrayOne[49] ) is the local stride of arrayOne.
Note: This also means that if a region of code contains only 1 memory reference, global stride will be the same as local stride.
locStride7:
locStride7 or local stride 7 looks at the number of memory references that have a local stride of 7.
It effectively measures the number of references that access the same 8 Byte memory location (like a double or uint64_t) repeatedly.
Local stride looks at individual memory references and its final value is the aggregate of the locStride7 value for all memory references.
Classifying locStride7:
locStride7 can be classified as low, medium or high as follows:
Bucket | Condition |
---|---|
Low |
The region of code contains no memory references that access 8 Byte memory locations.
OR
The region of code contains a number of memory references and 1 or fewer references
out of every 3 have a local stride of 7.
|
Medium | The region of code contains a number of memory references and 1 out of every 2 references has a local stride of 7. |
High | The region of code contains a number of memory references and 2 or more references out of every 3 have a local stride of 7. |
Example:
Consider the following code:
for (i=0; i<1000; i++){ for(j=0; j<10; j++){ sum += arrayOne[i] + arrayTwo[j]; } result[i] = sum; }and assume that all the elements in this region belong to the double data type. This region consists of 3 memory accesses but the accesses to arrayOne & arrayTwo are the most repeated. Therefore, we can ignore the accesses made to result.