Speed Up for-Loop Implementation in Code Generated by Using parfor

When you generate C/C++ code for a model by using a MATLAB Function, a MATLAB System, and For Each subsystem block, by default, the code generator produces code that implements for-loops in a single thread. This for-loop can be optimized for MATLAB Function and MATLAB System blocks. The iterations of the parfor-loop can run in parallel on multiple cores on the target hardware.

Running the iterations in parallel might significantly improve execution speed of generated code. For more information, see How parfor-Loops Improve Execution Speed.

The code generator implements the for-loops in parallel by using OpenMP.

Embedded Coder™ software uses the Open Multiprocessing (OpenMP) application interface to support shared-memory, multicore code generation. By default, Embedded Coder uses as many threads as it finds available. If you specify the number of threads to use, Embedded Coder uses at most that number of threads, even if additional threads are available. For more information, see parfor.

How parfor-Loops Improve Execution Speed

A parfor-loop might provide better execution speed than its analogous for-loop because several threads can compute concurrently on the same loop.

Each execution of the body of a parfor-loop is called an iteration. The threads evaluate iterations in an arbitrary order and independently of each other. Because each iteration is independent, the threads do not have to be synchronized. If the number of threads is equal to the number of loop iterations, each thread performs one iteration of the loop. If the number of iterations is greater than the number of threads, some threads perform more than one loop iteration.

For example, when a loop of 100 iterations runs on 20 threads, each thread simultaneously executes five iterations of the loop. If your loop takes a long time to run because of the large number of iterations or lengthy individual iterations, you can reduce the run time significantly by using multiple threads. In this example, you might not get 20 times improvement in speed because of parallelization overheads, such as thread creation and deletion.

When to Use parfor-Loops

Use parfor when you have:

  • Many iterations of a simple calculation. parfor divides the loop iterations into groups so that each thread executes one group of iterations.

  • A loop iteration that takes a long time to execute. parfor executes the iterations simultaneously on different threads. Although this simultaneous execution does not reduce the time spent on an individual iteration, it might significantly reduce overall time spent on the loop.

When Not to Use parfor-Loops

Do not use parfor when:

  • An iteration of your loop depends on other iterations. Running the iterations in parallel can lead to erroneous results.

    To help you avoid using parfor when an iteration of your loop depends on other iterations, Embedded Coder specifies a rigid classification of variables. For more information, see Classification of Variables in parfor-Loops. If Embedded Coder detects loops that do not conform to the parfor specifications, it does not generate code and produces an error.

    Reductions are an exception to the rule that loop iterations must be independent. A reduction variable accumulates a value that depends on all the iterations together, but is independent of the iteration order. For more information, see Reduction Variables.

  • There are only a few iterations that perform some simple calculations.

    Note

    For small number of loop iterations, you might not accelerate execution due to parallelization overheads. Such overheads include time taken for thread creation, data synchronization between threads, and thread deletion.

  • For Each Subsystem contains an S-Function block. The generated code will not contain parfor.

Write Code by Using parfor-Loops

To run for-loops in parallel in the generated code, write the code within a MATLAB Function, or a MATLAB System, block using parfor.

  1. Create a Simulink™ model.

  2. Add the MATLAB Function or the MATLAB System block to the model.

  3. Add the code to the MATLAB Function or the MATLAB System block.

    function y = access3a(u) %#codegen
    
    %   Copyright 2010 The MathWorks, Inc.
    
    persistent pA;
    if isempty(pA)
        pA = 0;
    end
    A = ones(20,50);
    t = 0;
    
    parfor (i = 1:10,4) 
        A(i,1) = A(i,1) + 1;                    
    end
    
    y = A(1,4) + u + t + pA;

  4. In the Optimization pane select the Maximize execution speed option from the Priority drop-down list. The parameter Generate parallel for-loops is automatically selected. The parameter enables the compiler to compute loops in parallel.

  5. Connect the blocks.

  6. Build the model and generate code.

    In the generated code, the pragma instructs the compiler to execute the looping in OpenMP parallel for-loops through multithreading :

    #pragma omp parallel for num_threads(4 > 
                    omp_get_max_threads() ? omp_get_max_threads() : 4)

    The number 4 indicates the number of processing threads.

Because the loop body can execute in parallel on multiple threads, it must conform to certain restrictions. If the Embedded Coder software detects loops that do not conform to parfor specifications, it produces an error. For more information, see parfor Restrictions.

Related Topics