Parallel STL is an implementation of the C++ standard library algorithms with support for execution policies, as specified in ISO/IEC 14882:2017 standard, commonly called C++17. The implementation also supports the unsequenced execution policy specified in Parallelism TS version 2 and proposed for the next version of the C++ standard in the C++ working group paper P1001R1.

Parallel STL offers efficient support for both parallel and vectorized execution of algorithms for Intel® processors. For sequential execution, it relies on an available implementation of the C++ standard library.

Parallel STL is available as a part of Intel® Parallel Studio XE and Intel® System Studio.

Prerequisites

To use Parallel STL, you must have the following software installed:

The latest version of the Intel® C++ Compiler is recommended for better performance of Parallel STL algorithms, comparing to previous compiler versions.

To build an application that uses Parallel STL on the command line, you need to set the environment variables for compilation and linkage. You can do this by calling suite-level environment scripts such as compilervars.{sh|csh|bat}, or you can set just the Parallel STL environment variables by running pstlvars.{sh|csh|bat} in <install_dir>/{linux|mac|windows}/pstl/bin.

<install_dir> is the installation directory, by default, it is:

For Linux* and macOS*:

For Windows*:

Using Parallel STL

Follow these steps to add Parallel STL to your application:

  1. Add the <install_dir>/include folder to the compiler include paths. You can do this by calling the pstlvars script.

  2. Add #include "pstl/execution" to your code. Then add a subset of the following set of lines, depending on the algorithms you intend to use:

    • #include "pstl/algorithm"
    • #include "pstl/numeric"
    • #include "pstl/memory"
  3. When using algorithms and execution policies, specify the namespaces std::execution in case of there is no vendor implementation of C++17 standard library or pstl::execution otherwise. See the 'Examples' section below.
  4. For any of the implemented algorithms, pass one of the values seq, unseq, par or par_unseq as the first parameter in a call to the algorithm to specify the desired execution policy. The policies have the following meaning:

    Execution policy

    Meaning

    seq

    Sequential execution.

    unseq

    Unsequenced SIMD execution. This policy requires that all functions provided are SIMD-safe.

    par

    Parallel execution by multiple threads.

    par_unseq

    Combined effect of unseq and par.

  5. Compile the code as C++11 (or later) and using compiler options for vectorization:

    • For the Intel® C++ Compiler:
      • For Linux* and macOS*: -qopenmp-simd
      • For Windows*: /Qopenmp-simd
    • For other compilers, find a switch that enables OpenMP* 4.0 SIMD constructs.

    To get good performance, specify the target platform. For the Intel® C++ Compiler, some of the relevant options are:

    • For Linux* and macOS*: -xHOST, -xSSE4.1, -xCORE-AVX2, -xMIC-AVX512.
    • For Windows*: /QxHOST, /QxSSE4.1, /QxCORE-AVX2, /QxMIC-AVX512.
    If using a different compiler, see its documentation.

  6. Link with the Intel TBB dynamic library for parallelism. For the Intel® C++ Compiler, use the options:

    • For Linux* and macOS*: -tbb
    • For Windows*: /Qtbb (optional, this should be handled by #pragma comment(lib, <libname>))

Version Macros

Macros related to versioning, as described below. You should not redefine these macros.

PSTL_VERSION

Current Parallel STL version. The value is a decimal numeral of the form xyy where x is the major version number and yy is the minor version number.

PSTL_VERSION_MAJOR

PSTL_VERSION/100; that is, the major version number.

PSTL_VERSION_MINOR

PSTL_VERSION - PSTL_VERSION_MAJOR * 100; that is, the minor version number.

Macros

PSTL_USE_PARALLEL_POLICIES

This macro controls the use of parallel policies.

When set to 0, it disables the par and par_unseq policies, making their use a compilation error. It's recommended for code that only uses vectorization with unseq policy, to avoid dependency on the TBB runtime library.

When the macro is not defined (default) or evaluates to a non-zero value all execution policies are enabled.

PSTL_USE_NONTEMPORAL_STORES

This macro enables the use of #pragma vector nontemporal in the algorithms std::copy, std::copy_n, std::fill, std::fill_n, std::generate, std::generate_n, std::move, std::rotate, std::rotate_copy, std::swap_ranges with the unseq policy. For further details about the pragma, see the User and Reference Guide for the Intel® C++ Compiler at https://software.intel.com/en-us/node/524559.

If the macro evaluates to a non-zero value, the use of #pragma vector nontemporal is enabled.

When the macro is not defined (default) or set to 0, the macro does nothing.

PSTL_USAGE_WARNINGS

This macro enables Parallel STL to emit compile-time messages, such as warnings about an algorithm not supporting a certain execution policy.

When set to 1, the macro allows the implementation to emit usage warnings. When the macro is not defined (default) or evaluates to zero, usage warnings are disabled.

Examples

Example 1

The following code calls vectorized copy:

#include "pstl/execution"
#include "pstl/algorithm"
void foo(float* a, float* b, int n) {
    std::copy(pstl::execution::unseq, a, a+n, b);
}

Example 2

This example calls the parallelized version of fill_n:

#include <vector>
#include "pstl/execution"
#include "pstl/algorithm"

int main()
{
    std::vector<int> data(10000000);
    std::fill_n(pstl::execution::par_unseq, data.begin(), data.size(), -1);  // Fill the vector with -1

    return 0;
}

Algorithms

The following table specifies whether parallel and unsequenced execution are supported for each of the C++17 algorithms accepting execution policies. Using an unsupported combination of algorithm and execution policy will result in sequential execution.

Algorithm

Algorithm page at cppreference.com

Implementation

adjacent_difference

http://en.cppreference.com/w/cpp/algorithm/adjacent_difference

parallel, unsequenced

adjacent_find

http://en.cppreference.com/w/cpp/algorithm/adjacent_find

parallel, unsequenced

all_of

http://en.cppreference.com/w/cpp/algorithm/all_any_none_of

parallel, unsequenced

any_of

http://en.cppreference.com/w/cpp/algorithm/all_any_none_of

parallel, unsequenced

copy

http://en.cppreference.com/w/cpp/algorithm/copy

parallel, unsequenced

copy_if

http://en.cppreference.com/w/cpp/algorithm/copy

parallel, unsequenced

copy_n

http://en.cppreference.com/w/cpp/algorithm/copy_n

parallel, unsequenced

count

http://en.cppreference.com/w/cpp/algorithm/count

parallel, unsequenced

count_if

http://en.cppreference.com/w/cpp/algorithm/count

parallel, unsequenced

destroy

http://en.cppreference.com/w/cpp/memory/destroy

parallel, unsequenced

destroy_n

http://en.cppreference.com/w/cpp/memory/destroy_n

parallel, unsequenced

equal

http://en.cppreference.com/w/cpp/algorithm/equal

parallel, unsequenced

exclusive_scan

http://en.cppreference.com/w/cpp/algorithm/exclusive_scan

parallel, unsequenced

fill

http://en.cppreference.com/w/cpp/algorithm/fill

parallel, unsequenced

fill_n

http://en.cppreference.com/w/cpp/algorithm/fill_n

parallel, unsequenced

find

http://en.cppreference.com/w/cpp/algorithm/find

parallel, unsequenced

find_end

http://en.cppreference.com/w/cpp/algorithm/find_end

parallel, unsequenced

find_first_of

http://en.cppreference.com/w/cpp/algorithm/find_first_of

parallel, unsequenced

find_if

http://en.cppreference.com/w/cpp/algorithm/find

parallel, unsequenced

find_if_not

http://en.cppreference.com/w/cpp/algorithm/find

parallel, unsequenced

for_each

http://en.cppreference.com/w/cpp/algorithm/for_each

parallel, unsequenced

for_each_n

http://en.cppreference.com/w/cpp/algorithm/for_each_n

parallel, unsequenced

generate

http://en.cppreference.com/w/cpp/algorithm/generate

parallel, unsequenced

generate_n

http://en.cppreference.com/w/cpp/algorithm/generate_n

parallel, unsequenced

includes

http://en.cppreference.com/w/cpp/algorithm/includes

parallel

inclusive_scan

http://en.cppreference.com/w/cpp/algorithm/inclusive_scan

parallel, unsequenced

inplace_merge

http://en.cppreference.com/w/cpp/algorithm/inplace_merge

parallel

is_heap

http://en.cppreference.com/w/cpp/algorithm/is_heap

parallel, unsequenced

is_heap_until

http://en.cppreference.com/w/cpp/algorithm/is_heap_until

parallel, unsequenced

is_partitioned

http://en.cppreference.com/w/cpp/algorithm/is_partitioned

parallel, unsequenced

is_sorted

http://en.cppreference.com/w/cpp/algorithm/is_sorted

parallel, unsequenced

is_sorted_until

http://en.cppreference.com/w/cpp/algorithm/is_sorted_until

parallel, unsequenced

lexicographical_compare

http://en.cppreference.com/w/cpp/algorithm/lexicographical_compare

parallel, unsequenced

max_element

http://en.cppreference.com/w/cpp/algorithm/max_element

parallel, unsequenced

merge

http://en.cppreference.com/w/cpp/algorithm/merge

parallel

min_element

http://en.cppreference.com/w/cpp/algorithm/min_element

parallel, unsequenced

minmax_element

http://en.cppreference.com/w/cpp/algorithm/minmax_element

parallel, unsequenced

mismatch

http://en.cppreference.com/w/cpp/algorithm/mismatch

parallel, unsequenced

move

http://en.cppreference.com/w/cpp/algorithm/move

parallel, unsequenced

none_of

http://en.cppreference.com/w/cpp/algorithm/all_any_none_of

parallel, unsequenced

nth_element

http://en.cppreference.com/w/cpp/algorithm/nth_element

parallel

partial_sort

http://en.cppreference.com/w/cpp/algorithm/partial_sort

parallel

partial_sort_copy

http://en.cppreference.com/w/cpp/algorithm/partial_sort_copy

parallel

partition

http://en.cppreference.com/w/cpp/algorithm/partition

parallel

partition_copy

http://en.cppreference.com/w/cpp/algorithm/partition_copy

parallel, unsequenced

reduce

http://en.cppreference.com/w/cpp/algorithm/reduce

parallel, unsequenced

remove

http://en.cppreference.com/w/cpp/algorithm/remove

parallel, unsequenced

remove_copy

http://en.cppreference.com/w/cpp/algorithm/remove_copy

parallel, unsequenced

remove_copy_if

http://en.cppreference.com/w/cpp/algorithm/remove_copy

parallel, unsequenced

remove_if

http://en.cppreference.com/w/cpp/algorithm/remove

parallel, unsequenced

replace

http://en.cppreference.com/w/cpp/algorithm/replace

parallel, unsequenced

replace_copy

http://en.cppreference.com/w/cpp/algorithm/replace_copy

parallel, unsequenced

replace_copy_if

http://en.cppreference.com/w/cpp/algorithm/replace_copy

parallel, unsequenced

replace_if

http://en.cppreference.com/w/cpp/algorithm/replace

parallel, unsequenced

reverse

http://en.cppreference.com/w/cpp/algorithm/reverse

parallel, unsequenced

reverse_copy

http://en.cppreference.com/w/cpp/algorithm/reverse_copy

parallel, unsequenced

rotate

http://en.cppreference.com/w/cpp/algorithm/rotate

parallel, unsequenced

rotate_copy

http://en.cppreference.com/w/cpp/algorithm/rotate_copy

parallel, unsequenced

search

http://en.cppreference.com/w/cpp/algorithm/search

parallel, unsequenced

search_n

http://en.cppreference.com/w/cpp/algorithm/search_n

parallel, unsequenced

set_difference

http://en.cppreference.com/w/cpp/algorithm/set_difference

parallel

set_intersection

http://en.cppreference.com/w/cpp/algorithm/set_intersection

parallel

set_symmetric_difference

http://en.cppreference.com/w/cpp/algorithm/set_symmetric_difference

parallel

set_union

http://en.cppreference.com/w/cpp/algorithm/set_union

parallel

sort

http://en.cppreference.com/w/cpp/algorithm/sort

parallel

stable_sort

http://en.cppreference.com/w/cpp/algorithm/stable_sort

parallel

stable_partition

http://en.cppreference.com/w/cpp/algorithm/stable_partition

parallel

swap_ranges

http://en.cppreference.com/w/cpp/algorithm/swap_ranges

parallel, unsequenced

transform

http://en.cppreference.com/w/cpp/algorithm/transform

parallel, unsequenced

transform_exclusive_scan

http://en.cppreference.com/w/cpp/algorithm/transform_exclusive_scan

parallel, unsequenced

transform_inclusive_scan

http://en.cppreference.com/w/cpp/algorithm/transform_inclusive_scan

parallel, unsequenced

transform_reduce

http://en.cppreference.com/w/cpp/algorithm/transform_reduce

parallel, unsequenced

uninitialized_copy

http://en.cppreference.com/w/cpp/memory/uninitialized_copy

parallel, unsequenced

uninitialized_copy_n

http://en.cppreference.com/w/cpp/memory/uninitialized_copy_n

parallel, unsequenced

uninitialized_default_construct

http://en.cppreference.com/w/cpp/memory/uninitialized_default_construct

parallel, unsequenced

uninitialized_default_construct_n

http://en.cppreference.com/w/cpp/memory/uninitialized_default_construct_n

parallel, unsequenced

uninitialized_fill

http://en.cppreference.com/w/cpp/memory/uninitialized_fill

parallel, unsequenced

uninitialized_fill_n

http://en.cppreference.com/w/cpp/memory/uninitialized_fill_n

parallel, unsequenced

uninitialized_move

http://en.cppreference.com/w/cpp/memory/uninitialized_move

parallel, unsequenced

uninitialized_move_n

http://en.cppreference.com/w/cpp/memory/uninitialized_move_n

parallel, unsequenced

uninitialized_value_construct

http://en.cppreference.com/w/cpp/memory/uninitialized_value_construct

parallel, unsequenced

uninitialized_value_construct_n

http://en.cppreference.com/w/cpp/memory/uninitialized_value_construct_n

parallel, unsequenced

unique

http://en.cppreference.com/w/cpp/algorithm/unique

parallel

unique_copy

http://en.cppreference.com/w/cpp/algorithm/unique_copy

parallel, unsequenced

Known limitations

unseq and par_unseq policies only have effect with compilers that support #pragma omp simd or #pragma simd.

Parallel and vector execution is only supported for most algorithms if random access iterators are provided, while for other iterator types the execution will remain serial, excepting for_each and transform which support parallel execution with forward iterators as well.
In case of forward iterators an execution of the invoked function should have enough work for the parallel execution to be effective.

Semantics of the following algorithms does not allow unsequenced execution: includes, inplace_merge, merge, set_difference, set_intersection, set_symmetric_difference, set_union, stable_partition, unique

The initial value type for exclusive_scan, inclusive_scan, transform_exclusive_scan, transform_inclusive_scan shall satisfy the DefaultConstructible requirements. A default-constructed instance of the initial value type shall be the identity element for binary_op.

For max_element, min_element, minmax_element, partial_sort, partial_sort_copy, sort, stable_sort the dereferenced value type of the provided iterators shall be DefaultConstructible.

For remove, remove_if, unique the dereferenced value type of the provided iterators shall be MoveConstructible.

The following algorithms require additional O(n) memory space for parallel execution: copy_if, inplace_merge, partial_sort, partial_sort_copy, partition_copy, remove, remove_if, rotate, sort, stable_sort, unique, unique_copy.