SpMV with Block CSR

A sparse matrix is a matrix which contains a large number of zero elements as opposed to a dense matrix which contains a significantly higher number of non-zero elements. Due to the "sparse" nature of non-zero elements, a sparse matrix can be compressed to reduce the amount of memory required for storage. One of the the more common compression methods is Compressed Sparse Row or CSR. In CSR compression, a sparse matrix is compressed to three dense arrays:

  1. Value array: This 1D array of floats contains all non-zero elements of the array in row major format.
  2. Column index: This 1D array of integers contains the column number of non-zero elements in the sparse matrix.
  3. row pointer: A 1D array of integers which is used to index the value array and contains the indices of the first non-zero element of a row.

Block CSR is a block version of CSR in which the sparse matrix is split into fixed size blocks. value_array, column_index and row_pointer arrays are modified to handle blocks rather than individual values.

Source code snippet

//Number of rows/columns in one block
#define ROWS_IN_ONE_BLOCK  4
#define COLS_IN_ONE_BLOCK  4

//Accounting for presense of elements in sparse block
#define NOT_ACCOUNTED      0
#define ACCOUNTED          1

#define BLOCK_SIZE         (ROWS_IN_ONE_BLOCK*COLS_IN_ONE_BLOCK)
#define INDEX(i,j,n)       (i + (j*COLS_IN_ONE_BLOCK) + (n*BLOCK_SIZE))

int num_rows = 14816;
int num_cols = 14816; 
int *row_blk_ptr = (int *)malloc((num_rows_sparse+1)*sizeof(int));
int *col_blk_ind = (int *)malloc(num_rows_sparse*num_cols_sparse*sizeof(int));;
float *apk = (float *)calloc(dense_blk_cnt*BLOCK_SIZE,sizeof(float));
float *x = (float *)malloc(num_cols * sizeof(float));

float *y = (float *)calloc(num_rows,sizeof(float));

//Number of rows in sparse block matrix
int num_rows_sparse = num_rows/ROWS_IN_ONE_BLOCK;

//Number of columns in sparse block matrix
int num_cols_sparse = num_cols/COLS_IN_ONE_BLOCK;
for( i = 0; i < num_rows_sparse; i++){
    for( j = row_blk_ptr[i]; j < row_blk_ptr[i+1]; j++, col_blk_ind++, apk += BLOCK_SIZE){
    //this loop averages <50 iterations 
	for( w = 0; w < 4; ++w ){
            y0 = y[4*i+w];
            for( k = 0; k < 4; ++k ){
                y0 += apk[0+w*4+k]*x[*col_blk_ind +k];
            }
            y[4*i+w] = y0;
        }
    }
}