One of the optimization features of the Target Language Compiler is the intrinsic support for loop rolling. Based on a specified threshold, code generation for looping operations can be unrolled or left as a loop (rolled).
Coupled with loop rolling is the concept of noncontiguous signals. Consider the following model:
The input to the timestwo
S-function comes
from two arrays located at two different memory locations, one for
the output of source1
and one for the output of
block source2
. This is because of an optimization
that makes the Mux block virtual,
meaning that code is not explicitly generated for the Mux block
and thus processor cycles are not spent evaluating it (i.e., it becomes
a pure graphical convenience for the block diagram). So this is represented
in the
file
in this case asmodel
.rtw
Block { Type "S-Function" MaskType "S-function: timestwo" BlockIdx [0, 0, 2] SL_BlockIdx 2 GrSrc [0, 1] ExprCommentInfo { SysIdxList [] BlkIdxList [] PortIdxList [] } ExprCommentSrcIdx { SysIdx -1 BlkIdx -1 PortIdx -1 } Name "<Root>/timestwo C-MEX S-Function" SLName "<Root>/timestwo \nC-MEX S-Function" Identifier timestwoCMEXSFunction TID 0 RollRegions [0:19, 20:49] NumDataInputPorts 1 DataInputPort { SignalSrc [b0@20, b1@30] SignalOffset [0:19, 0:29] Width 50 RollRegions [0:19, 20:49] } NumDataOutputPorts 1 DataOutputPort { SignalSrc [b2@50] SignalOffset [0:49] Width 50 } Connections { InputPortContiguous [no] InputPortConnected [yes] OutputPortConnected [yes] OutputPortBeingMerged [no] DirectSrcConn [no] DirectDstConn [yes] DataOutputPort { NumConnPoints 1 ConnPoint { SrcSignal [0, 50] DstBlockAndPortEl [0, 4, 0, 0] } } } . . .
From this fragment of the
file
you can see that the block and input port model
.rtwRollRegion
entries
are not just one number, but two groups of numbers. This denotes two
groupings in memory for the input signal. The generated code looks
like this:
/* S-Function Block: <Root>/timestwo C-MEX S-Function */ /* Multiply input by two */ { int_T i1; const real_T *u0 = &contig_sample_B.u[0]; real_T *y0 = contig_sample_B.timestwoCMEXSFunction_m; for (i1=0; i1 < 20; i1++) { y0[i1] = u0[i1] * 2.0; } u0 = &contig_sample_B.u_o[0]; y0 = &contig_sample_B.timestwoCMEXSFunction_m[20]; for (i1=0; i1 < 30; i1++) { y0[i1] = u0[i1] * 2.0; } }
Notice that two loops are generated and between them the input
signal is redirected from the first base address, &contig_sample_B.u[0]
,
to the second base address of the signals, &contig_sample_B.u_o[0]
.
If you do not want to support this in your S-function or your generated
code, you can use
ssSetInputPortRequiredContiguous(S, 1);
in the mdlInitializeSizes
function to cause Simulink® to implicitly generate code that performs a buffering operation. This option
uses both extra memory and CPU cycles at run-time, but might be worth it if your algorithm
performance increases enough to offset the overhead of the
buffering.
Use the %roll
directive to generate loops.
See also %roll for the reference entry
for %roll
, and Input Signal Functions for a discussion on the behavior of %roll
.