This example demonstrates how to solve a system of equations using hardware-efficient MATLAB® code embedded in Simulink® models. The model used in this example is suitable for HDL code generation for fixed-point inputs. The algorithm uses a computational architecture that shares computational and memory units across different steps of the solver algorithm. This is beneficial when deploying fixed-point algorithms to FPGA or ASIC devices with constrained resources. While this solver has a smaller throughput than a fully pipelined implementation, it also has a smaller on-chip footprint, making it suitable for resource-conscious designs.
Any system of linear equations can be expressed in matrix form as
where is an
-by-
matrix of known data,
is an
-by-1 vector of unknowns, and
is an
-by-1 vector.
In general, linear systems are most effectively solved by first decomposing the system into lower triangular form, and then finding the unknowns directly through back substitution. This method is both numerically stable and efficient. It is thus a good choice for embedded applications.
Any matrix can be decomposed as
where ,
is an
-by-
upper triangular matrix, and
is the
-by-
identity matrix. This allows us to transform a system of equations into the upper triangular form
This equation can be solved through back substitution when is non-singular. A benefit of this method is that
does not need to be calculated. Instead, the transformations that take
to
can simultaneously be applied to
. This gives the same form as the previous equation with a smaller memory and computational footprint.
The QR Decomposition is performed through a series of Givens rotations. These are implemented in hardware using the Coordinate Rotation Digital Computer (CORDIC) technique. For a detailed explanation of the CORDIC QR Algorithm, see the example Perform QR Factorization Using CORDIC..
Real Burst Matrix Solve using QR Decomposition supports HDL Code generation for binary point scaled fixed-point data. It is designed with this application in mind, and employs hardware specific semantics and optimizations. One of these optimizations is resource sharing.
When deploying intricate algorithms to FPGA or ASIC devices, there is often a trade-off between a computation's resource usage and its total throughput. While fully pipelined and parallelized algorithms have the greatest throughput, they are often too resource intensive to deploy on real devices. By implementing scheduling logic around one or several core computational circuits, it is possible to reuse resources throughout a computation. The result is an implementation with a much smaller footprint, albeit at the cost of total throughput. This is often an acceptable trade-off, as resource shared designs can still meet overall latency requirements.
All of the key computational units in Real Burst Matrix Solve using QR Decomposition are reused throughout the computation life cycle. This includes not only the CORDIC circuitry used to perform the Givens rotations during the QR decomposition, but also the adders and multipliers used during back substitution. This saves both DSP and fabric resources when deploying to FPGA or ASIC devices.
In simulation, single, double, binary-point scaled fixed point, and binary-point scaled-double data are supported. However, for HDL code generation, only binary-point scaled fixed-point types are supported.
Real Burst Matrix Solve using QR Decomposition uses matrices A
and B
from either the model or base workspace to define the system of equations to solve. It takes four input parameters on its mask: the number of rows in matrices A
and B
, the number of columns in matrix A
, the number of columns in matrix B
, and the output datatype. The block mask is shown below.
These parameters are defined in the workspace by the variables m
, n
, p
, and OutputType
, respectively. Additionally, the variable num_samples
needs to be defined in the model or base workspace.
To begin, we need to a series of systems of equations to solve. Real world data often come from Gaussian ensembles, so random matrices drawn from the standard normal distribution will be used in this example. The standard normal distribution has a mean value of 0 and a variance of 1. Its properties describes many, though not all, sources of randomness found in nature. By virtue of this, we will use data drawn from this distribution as an example to demonstrate how to use Real Burst Matrix Solve using QR Decomposition.
First, choose a size for the system being solved.
m = 4; n = 4; p = 1; num_samples = 100;
Next, generate random, double-precision data. This data is used as a numerical benchmark for the required dynamic range. This is helpful when converting to fixed point.
A = randn(m,n,num_samples); B = randn(m,p,num_samples);
Here, the first two indices correspond to the matrix indices in each sample, while the third index corresponds to the sample number. For example A(1,2,3)
corresponds to the element in the first row and second column of the third sample.
Before converting to fixed point, we will solve the system in floating-point arithmetic. Then, we will use the range information gained from this simulation to choose suitable input and output datatypes for a fixed-point version of the model.
To generate the floating-point version of the model, use the function call below.
mdl = fixed.getMatrixSolveModel(A,B);
This function sets data in the model workspace. Thus, if you wish to change the data stored in or
, you must either modify the data in the model workspace, or generate a new model using fixed-point data. Later in this example, we will use the latter approach.
The model generated by the function above is already configured to simulate without modification. The generated model uses a block called Data Handler to send data to the matrix solver. It communicates with the matrix solver by listening to the solver's ready port, and pulling validIn high on ready's rising edge. This tells Real Burst Matrix Solve using QR Decomposition that the current rows at ports A(i,:) and B(i,:) are valid. Whenever ready and valid are simultaneously high, the data of these ports are treated as the next valid row of the matrix.
Whenever the solver sees that both ready and validIn are true, it consumes the data input at ports A(i,:) and B(i,:). It must keep validIn held high if it wishes to do so. Then, Real Burst Matrix Solve using QR Decomposition will hold this row in a buffer until it is ready to process it. This mode of operation is shown below, in which two samples are consumed.
Whenever ready goes high, it can consume two rows. If two rows are sent, Real Burst Matrix Solve using QR Decomposition will store the second row in an internal buffer.It is not mandatory to send two rows when ready is asserted, and if only one sample is sent because validIn is deasserted, ready will deassert and only reassert when more data is need for the computation.
Note that adding additional delays in either the path from ready to Data Handler, or between Data Handler to validIn will add a back pressure that is not expected. In this scenario, samples will be dropped. We recommend that you either only send data on ready's leading edge, or that you do not insert additional delays between the block and the data source. Moreover, if you plan to generate HDL code using this block, we recommend turning off pipelining optimizations, as these can add delays on these paths.
To simulate the model, click the simulate button or use the sim
command.
out = sim(mdl);
Real Burst Matrix Solve using QR Decomposition outputs data one row at a time. Whenever the block outputs a result row, it asserts validOut
. The solution is output in the field X_out
in the variable out
. This data is still stored as a 1
-by- p
-by- n*num_samples
array of rows. Before evaluating the accuracy of the solution in MATLAB®, it is convenient to convert it to a n
-by- p
-by- num_samples
array. This can be done with the utility function matrixSolveOutputToMatrix
.
X = matrixSolveOutputToMatrix(out.X_out,n,p,num_samples);
The accuracy of the solution calculated in our Simulink® model can easily be calculated in MATLAB®. Note that theory expects that the relative error of the solution of a system increases with the condition number of the matrix. To see this, we will plot the error given by
against the condition number of .
condNumber = zeros(1,num_samples); relativeError = zeros(1,num_samples); for idx = 1:num_samples condNumber(idx) = cond(A(:,:,idx)); relativeError(idx) = norm(B(:,:,idx) - A(:,:,idx)*X(:,:,idx))./norm(B(:,:,idx)); end
Now, plot the output of the error calculation. Note that log scales are used on both axes, as both the condition numbers and errors span several orders of magnitude. The positive correlation between the two values is quite clear here.
figure(1); clf; h1 = axes(); hold on; plot(condNumber,relativeError,'bo'); h1.XScale = 'log'; h1.YScale = 'log'; xlabel('cond(A)'); ylabel('Relative Error');
The double-precision algorithm simulated above is quite accurate. Maintaining high numerical accuracy in a fixed-point version of the design requires input and output datatypes that are large enough to avoid overflow, which can occur due to the growth in magnitude that occurs in either the QR decomposition or back substitute steps.
For Real Burst QR Decomposition, the type of R(i,:)
is the same as the type of A(i,:)
. Similarly, the type of C(i,:)
is the same as the type of B(i,:)
. These types are used throughout the calculation. Thus, the input datatypes need to accommodate the largest possible values needed in the calculation of each of R(i,:)
and C(i,:)
.
There are two effects that cause the input data to grow in magnitude during the computation: the growth of elements in A
due to the Euclidean norm preserving nature of the rotations used in the QR Decomposition, and the intrinsic internal growth of the CORDIC implementation of these rotations. These two effects bound the elements of the outputs in magnitude by
ceil(log2(1.65*sqrt(m)*max(abs(A(:)))))
for R(i,:)
, and
ceil(log2(1.65*sqrt(m)*max(abs(B(:)))))
for C(i,:)
. In this example, since A
and B
were drawn from the same statistical distribution, we give both the same word length and fraction length. In general, A
and B
do not need to have the same fraction length, provided that they have the same word length.
maximumAbsValueInDataset = max(max(abs(A(:))),max(abs(B(:)))) inputIntegerLength = ceil(log2(1.65*sqrt(m)*maximumAbsValueInDataset))
maximumAbsValueInDataset = 3.5784 inputIntegerLength = 4
We will use an 18 bit wordlength type as our input. This can be changed to suit different applications. That variable inputType
calculated below encodes the type information in a NumericType object.
inputWordLength = 18 inputFractionLength = 18 - inputIntegerLength - 1 inputType = numerictype(1, inputWordLength, inputFractionLength)
inputWordLength = 18 inputFractionLength = 13 inputType = DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 18 FractionLength: 13
For a detailed description of the growth bounds in the CORDIC QR Decomposition Algorithm, see Prevent Overflow in Fixed Point R.
The data can also grow in the back substitute portion of the algorithm due to division by small values. By simulating with relevant input data, it is possible to make a good output datatype choice. For this step, the magnitude of the output value of X(i,:)
is bounded by
1.65*sqrt(m)*max(abs(B(:)))/sigma_min,
where sigma_min
is the smallest singular value observed across all of our sample matrices. These values are calculated below.
sigma = zeros(1, num_samples); for idx = 1:num_samples sigma(idx) = min(svd(A(:,:,idx))); end sigma_min = min(sigma); backsubstituteWordlengthGrowth = ceil(log2(1.65*sqrt(m)/sigma_min));
This gives an output datatype given as shown below
outputWordlength = inputWordLength + backsubstituteWordlengthGrowth outputFractionLength = inputFractionLength OutputType = numerictype(1, outputWordlength, outputFractionLength)
outputWordlength = 30 outputFractionLength = 13 OutputType = DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 30 FractionLength: 13
To simulate in fixed point, first convert the input data to the calculated input type.
A_Fixed = fi(A, inputType); B_Fixed = fi(B, inputType);
The existing model can be updated by updating the variables A
, B
, and OutputType
in the model workspace. However, the model creation function also has an option to create a version with the output type specified.
mdl_fi = fixed.getMatrixSolveModel(A_Fixed, B_Fixed, OutputType);
out_fi = sim(mdl_fi);
X_fi = matrixSolveOutputToMatrix(out_fi.X_out,n,p,num_samples);
Since we already know the condition number of the input matrices from our input calculation, we can calculate the relative error in the fixed-point solution using these values.
relativeErrorFixedPoint = zeros(1, num_samples); for idx = 1:num_samples relativeErrorFixedPoint(idx) = norm(double(B_Fixed(:,:,idx) - A_Fixed(:,:,idx)*X_fi(:,:,idx)))./norm(double(B_Fixed(:,:,idx))); end
Again we will plot the error of the calculation, this time using the fixed-point results. Note that the error is much larger here due to quantization.
figure(2); clf; h2 = axes(); hold on; plot(condNumber,relativeErrorFixedPoint,'bo'); h2.XScale = 'log'; h2.YScale = 'log'; xlabel('cond(A)'); ylabel('Relative Error');