Parallel STL is an implementation of the C++ standard library algorithms with support for execution policies, as specified in ISO/IEC 14882:2017 standard, commonly called C++17. The implementation also supports the unsequenced execution policy specified in Parallelism TS version 2 and proposed for the next version of the C++ standard in the C++ working group paper P1001R1.
Parallel STL offers efficient support for both parallel and vectorized execution of algorithms for Intel® processors. For sequential execution, it relies on an available implementation of the C++ standard library.
Parallel STL is available as a part of Intel® Parallel Studio XE and Intel® System Studio.
To use Parallel STL, you must have the following software installed:
The latest version of the Intel® C++ Compiler is recommended for better performance of Parallel STL algorithms, comparing to previous compiler versions.
To build an application that uses Parallel STL on the command line, you need to set the environment variables for compilation and linkage.
You can do this by calling suite-level environment scripts such as compilervars.{sh|csh|bat}
, or you can set just the Parallel STL
environment variables by running pstlvars.{sh|csh|bat}
in
<install_dir>/{linux|mac|windows}/pstl/bin
.
<install_dir>
is the installation directory, by default, it is:
For Linux* and macOS*:
/opt/intel/compilers_and_libraries_<version>
$HOME/intel/compilers_and_libraries_<version>
For Windows*:
<Program Files>\IntelSWTools\compilers_and_libraries_<version>
Follow these steps to add Parallel STL to your application:
Add the <install_dir>/include
folder to
the compiler include paths. You can do this by calling the pstlvars
script.
Add #include "pstl/execution"
to your code. Then add a subset of the following set of lines, depending on the algorithms you intend to use:
#include "pstl/algorithm"
#include "pstl/numeric"
#include "pstl/memory"
std::execution
in case of there is no vendor implementation of C++17 standard
library or pstl::execution
otherwise. See the 'Examples' section below.For any of the implemented algorithms, pass one of the values seq
, unseq
, par
or par_unseq
as
the first parameter in a call to the algorithm to specify the desired
execution policy. The policies have
the following meaning:
Execution policy |
Meaning |
---|---|
|
Sequential execution. |
|
Unsequenced SIMD execution. This policy requires that all functions provided are SIMD-safe. |
|
Parallel execution by multiple threads. |
|
Combined effect of |
Compile the code as C++11 (or later) and using compiler options for vectorization:
-qopenmp-simd
/Qopenmp-simd
To get good performance, specify the target platform. For the Intel® C++ Compiler, some of the relevant options are:
-xHOST
, -xSSE4.1
, -xCORE-AVX2
, -xMIC-AVX512
./QxHOST
, /QxSSE4.1
, /QxCORE-AVX2
, /QxMIC-AVX512
.Link with the Intel TBB dynamic library for parallelism. For the Intel® C++ Compiler, use the options:
-tbb
/Qtbb
(optional, this should be handled by #pragma comment(lib, <libname>)
)Macros related to versioning, as described below. You should not redefine these macros.
PSTL_VERSION
Current Parallel STL version. The value is a decimal numeral of the form xyy where x is the major version number and yy is the minor version number.
PSTL_VERSION_MAJOR
PSTL_VERSION/100; that is, the major version number.
PSTL_VERSION_MINOR
PSTL_VERSION - PSTL_VERSION_MAJOR * 100; that is, the minor version number.
PSTL_USE_PARALLEL_POLICIES
This macro controls the use of parallel policies.
When set to 0, it disables the par and par_unseq policies, making their use a compilation error. It's recommended for code that only uses vectorization with unseq policy, to avoid dependency on the TBB runtime library.
When the macro is not defined (default) or evaluates to a non-zero value all execution policies are enabled.
PSTL_USE_NONTEMPORAL_STORES
This macro enables the use of #pragma vector nontemporal
in the algorithms std::copy
, std::copy_n
,
std::fill
, std::fill_n
, std::generate
, std::generate_n
, std::move
, std::rotate
,
std::rotate_copy
, std::swap_ranges
with the unseq policy. For further details about the pragma, see the User and Reference Guide
for the Intel® C++ Compiler at https://software.intel.com/en-us/node/524559.
If the macro evaluates to a non-zero value, the use of #pragma vector nontemporal
is enabled.
When the macro is not defined (default) or set to 0, the macro does nothing.
PSTL_USAGE_WARNINGS
This macro enables Parallel STL to emit compile-time messages, such as warnings about an algorithm not supporting a certain execution policy.
When set to 1, the macro allows the implementation to emit usage warnings. When the macro is not defined (default) or evaluates to zero, usage warnings are disabled.
Example 1
The following code calls vectorized copy
:
#include "pstl/execution"
#include "pstl/algorithm"
void foo(float* a, float* b, int n) {
std::copy(pstl::execution::unseq, a, a+n, b);
}
Example 2
This example calls the parallelized version of fill_n
:
#include <vector>
#include "pstl/execution"
#include "pstl/algorithm"
int main()
{
std::vector<int> data(10000000);
std::fill_n(pstl::execution::par_unseq, data.begin(), data.size(), -1); // Fill the vector with -1
return 0;
}
The following table specifies whether parallel and unsequenced execution are supported for each of the C++17 algorithms accepting execution policies. Using an unsupported combination of algorithm and execution policy will result in sequential execution.
Algorithm |
Algorithm page at cppreference.com |
Implementation |
---|---|---|
|
http://en.cppreference.com/w/cpp/algorithm/adjacent_difference |
parallel, unsequenced |
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel |
|
|
parallel, unsequenced |
|
|
parallel |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
http://en.cppreference.com/w/cpp/algorithm/lexicographical_compare |
parallel, unsequenced |
|
parallel, unsequenced |
|
|
parallel |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel |
|
|
parallel |
|
|
http://en.cppreference.com/w/cpp/algorithm/partial_sort_copy |
parallel |
|
parallel |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
parallel |
|
|
parallel |
|
|
http://en.cppreference.com/w/cpp/algorithm/set_symmetric_difference |
parallel |
|
parallel |
|
|
parallel |
|
|
parallel |
|
|
parallel |
|
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
http://en.cppreference.com/w/cpp/algorithm/transform_exclusive_scan |
parallel, unsequenced |
|
http://en.cppreference.com/w/cpp/algorithm/transform_inclusive_scan |
parallel, unsequenced |
|
parallel, unsequenced |
|
|
parallel, unsequenced |
|
|
http://en.cppreference.com/w/cpp/memory/uninitialized_copy_n |
parallel, unsequenced |
|
http://en.cppreference.com/w/cpp/memory/uninitialized_default_construct |
parallel, unsequenced |
|
http://en.cppreference.com/w/cpp/memory/uninitialized_default_construct_n |
parallel, unsequenced |
|
parallel, unsequenced |
|
|
http://en.cppreference.com/w/cpp/memory/uninitialized_fill_n |
parallel, unsequenced |
|
parallel, unsequenced |
|
|
http://en.cppreference.com/w/cpp/memory/uninitialized_move_n |
parallel, unsequenced |
|
http://en.cppreference.com/w/cpp/memory/uninitialized_value_construct |
parallel, unsequenced |
|
http://en.cppreference.com/w/cpp/memory/uninitialized_value_construct_n |
parallel, unsequenced |
|
parallel |
|
|
parallel, unsequenced |
unseq
and par_unseq
policies only have effect with compilers that
support #pragma omp simd
or #pragma simd
.
Parallel and vector execution is only supported for most algorithms if random access iterators are provided,
while for other iterator types the execution will remain serial, excepting for_each
and transform
which support parallel execution with forward iterators as well.
In case of forward iterators an execution of the invoked function should have enough work for the parallel execution to be effective.
Semantics of the following algorithms does not allow unsequenced execution: includes, inplace_merge, merge, set_difference, set_intersection, set_symmetric_difference, set_union, stable_partition, unique
The initial value type for exclusive_scan, inclusive_scan, transform_exclusive_scan, transform_inclusive_scan
shall satisfy the DefaultConstructible
requirements. A default-constructed instance of the initial value type shall be the identity element for binary_op
.
For max_element, min_element, minmax_element, partial_sort, partial_sort_copy, sort, stable_sort
the dereferenced value type of the provided iterators shall be DefaultConstructible
.
For remove, remove_if, unique
the dereferenced value type of the provided iterators shall be MoveConstructible
.
The following algorithms require additional O(n) memory space for parallel execution: copy_if, inplace_merge, partial_sort, partial_sort_copy, partition_copy, remove, remove_if, rotate, sort, stable_sort, unique, unique_copy
.
Optimization Notice |
---|
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 |
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
* Other names and brands may be claimed as the property of others.
Copyright 2017-2019 Intel Corporation.
This software and the related documents are Intel copyrighted materials, and your use of them is governed by the express license under which they were provided to you (License). Unless the License provides otherwise, you may not use, modify, copy, publish, distribute, disclose or transmit this software or the related documents without Intel's prior written permission.
This software and the related documents are provided as is, with no express or implied warranties, other than those that are expressly stated in the License.