Disclaimer

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm

Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to: http://www.intel.com/products/processor_number/

BlueMoon, BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Inside, Core Inside, i960, Intel, the Intel logo, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel Sponsors of Tomorrow., the Intel Sponsors of Tomorrow. logo, Intel StrataFlash, Intel Viiv, Intel vPro, Intel XScale, InTru, the InTru logo, InTru soundmark, Itanium, Itanium Inside, MCS, MMX, Moblin, Pentium, Pentium Inside, skoool, the skoool logo, Sound Mark, The Journey Inside, vPro Inside, VTune, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries.
* Other names and brands may be claimed as the property of others.

Microsoft, Windows, Visual Studio, Visual C++, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.

Java is a registered trademark of Oracle and/or its affiliates.

Copyright(C) 2014-2016 Intel Corporation. All rights reserved.

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.

Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

License Definitions

By downloading and installing this sample, you hereby agree that the accompanying materials are being provided to you under the terms and conditions of the End User License Agreement for the Intel® Integrated Performance Primitives (Intel® IPP) product previously accepted by you.

System Requirements

Recommended hardware:

Hardware requirements:

Software requirements:

Product specific requirements:

For more information please see Intel® IPP System Requirements.

Installation and Build

Extract files from the examples package to the desired destination folder. Make sure that the directory structure is preserved.

Setting the Build Environment

There are different methods of preparation of build environment for the Intel® IPP examples. They are described below.

Setting the Build Environment on Windows*

Suppose, that you have installed Intel® Parallel Studio to <install_dir> of your computer. By default, <install_dir> is C:\Program Files (x86)\IntelSWTools directory.

First - the simplest - method is to start command-line window, using menu Start\All Programs\Intel Parallel Studio XE <yyyy>\Compiler and Performance Libraries\Command Prompt with Intel Compiler <vv>\Compiler <vv> for <cpu_arch> Visual Studio <vs_ver> environment.

Here

  • <yyyy> is Parallel Studio version (2017 or higher) with specific update and package numbers;

  • <vv> is Intel® C/C++ compiler version;

  • <cpu_arch> is "IA-32" or "Intel 64".

You will have command-line window with all required evironment prepared.

The second method is to use specialized script to setup environment for Intel® tools from existing command-line window. The script is <install_dir>\parallel_studio_xe\compilers_and_libraries_<yyyy>\windows\bin\compilervars.bat and has two arguments: requred CPU architecture ("ia32" or "intel64") and required Microsoft* Visual Studio* environment ("vs2012", "vs2013", "vs2013shell", or "vs2015").

For example,

>"C:\Program Files (x86)\IntelSWTools\parallel_studio_xe\compilers_and_libraries_2017\windows\bin\compilervars.bat" ia32 vs2012
Copyright (C) 1985-2016 Intel Corporation. All rights reserved.
Intel(R) Compiler xx.yy

>set IPPROOT
IPPROOT=C:\Program Files (x86)\IntelSWTools\parallel_studio_xe\compilers_and_libraries_2017\windows\ipp

>set TBBROOT
TBBROOT=C:\Program Files (x86)\IntelSWTools\parallel_studio_xe\compilers_and_libraries_2017\windows\tbb\bin\..

The third method is to set separately environment variable for Intel IPP and, if you want, for Intel TBB. For that, two scripts are placed in Parallel Studio directory tree: <install_dir>\parallel_studio_xe_<yyyy>\compilers_and_libraries_<yyyy>\windows\ipp\bin\ippvars.bat and <install_dir>\parallel_studio_xe_<yyyy>\compilers_and_libraries_<yyyy>\windows\tbb\bin\tbbvars.bat.

The fourth method is to set IPPROOT and TBBROOT manually. The main idea is that

  • IPPROOT/include should point to Intel IPP include files;

  • IPPROOT/lib should point to Intel IPP library directories. IPPROOT/lib/ia32 - to 32-bit Intel IPP libraries, IPPROOT/lib/intel64 to 64-bit Intel IPP libraries. The command-line window must have appropriate Visual Studio* environment set in this case.

For example, to set Microsoft* Visual Studio* 2015 environment and Intel IPP example environment, you need to do

>"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\amd64\vcvars64.bat"
>set IPPROOT=C:\Program Files (x86)\IntelSWTools\parallel_studio_xe\compilers_and_libraries_2017\windows\ipp
>set TBBROOT=C:\Program Files (x86)\IntelSWTools\parallel_studio_xe\compilers_and_libraries_2017\windows\tbb

Setting the Build Environment on Linux*

Start terminal window on Linux*. In this system Intel® Parallel Studio tools are installed by default to /opt/intel directory.

So, you can do any of the following steps.

$ source /opt/intel/parallel_studio_xe_<yyyy>/compilers_and_libraries_<yyyy>/linux/bin/compilervars.sh intel64 -platform linux
$ echo $IPPROOT; echo $TBBROOT
/opt/intel/compilers_and_libraries_<yyyy>/linux/ipp
/opt/intel/compilers_and_libraries_<yyyy>/linux/tbb

This is to set up both Intel IPP and Intel TBB environment.

Or, you can set Intel IPP and Intel TBB separately

$ source /opt/intel/parallel_studio_xe_<yyyy>/compilers_and_libraries_<yyyy>/linux/ipp/bin/ippvars.sh intel64
$ source /opt/intel/parallel_studio_xe_<yyyy>/compilers_and_libraries_<yyyy>/linux/tbb/bin/tbbvars.sh intel64
$ echo $IPPROOT; echo $TBBROOT
/opt/intel/compilers_and_libraries_<yyyy>/linux/ipp
/opt/intel/compilers_and_libraries_<yyyy>/linux/tbb

Or, you can export the required environment variables manually

$ export IPPROOT=/opt/intel/compilers_and_libraries_<yyyy>/linux/ipp
$ export TBBROOT=/opt/intel/compilers_and_libraries_<yyyy>/linux/tbb
Note
Using of specialized scripts compilervars, ippvars, or tbbvars has one important benefit. They add path to Intel IPP and Intel TBB dynamic shared objects to LD_LIBRARY_PATH environment variable, which allows to run application dynamically linked to these libraries.

Setting the Build Environment on OS X*

Setting of environment on OS X* system is much the same as on Linux*. The only difference is that you need to use mac subdirectory instead of linux in the paths to environment scripts.

Build Procedure

Before the build, you need to un-archive the examples from the package (zip on Windows* and tgz on Linux*/OS X* to your working directory.

Note
By default, Intel® C++ Compiler is used only by the ipp_thread_mic example.
Other examples use native compilers, such as Microsoft* Compiler, GCC*, or Clang*.

Build Examples On Windows* Using Microsoft* Visual Studio*

Go to main examples directory examples_core. Load Visual Studio* projects directly to Microsoft* Visual Studio* or build them from command-line:

> devenv <example>.sln /build <configuration>

The <configuration> can be

  • "Debug|Win32", "Release|Win32", "Debug|x64", or "Release|x64" for all examples;

  • And, "Debug_tbb|Win32", "Release_tbb|Win32", "Debug_tbb|x64", or "Release_tbb|x64" for ipp_resize_mt and ipp_thread examples.

Build Examples On Linux* Using Makefiles

Go to main examples directory examples_core. Execute GNU* Make as

$ make [ARCH=ia32|intel64] [CONF=configuration] [clean]

where ARCH and CONF are optional parameters.

ARCH=ia32 is set, if you want to build 32-bit example on a 64-bit operating system. For that, the Intel IPP environment must be set for 32-bit architecture as

$ compilervars.sh ia32

or

$ ippvars.sh ia32

and 32-bit GCC libraries must be installed on your system.

CONF=configuration is optional parameter to build release or debug version of example application with or without Intel TBB support. The possible configurations are

Configuration Description

release

Default configuration for release build without Intel TBB. The compiler option is "-O2"

debug

Debug build with compiler options "-O0 -g" without Intel TBB

release_tbb

Release build with Intel TBB support. The compiler options are "-O2 -DUSE_TBB"

debug_tbb

Debug build with Intel TBB support. The compiler options are "-O0 -g -DUSE_TBB". The debug libraries of Intel TBB are used

The optional parameter clean cleans up the working directory.

You can run make command either from top directory of Intel IPP examples components/examples_core or from example specific sub-directory, for example, components/examples_core/ipp_fft. In the first case all examples will be built, in the second - you will build only the specific example.

Build Examples On OS X* Using Xcodebuild

On OS X*, load XCode* project directly to XCode* or build it from command-line:

$ xcodebuild -configuration <configuration>

Build Examples On Windows* For Intel® System Studio

Execute <ISS install dir>\bin\compilervars.bat script for target architecture (ia32 or intel64) to obtain PATH variable update for Intel® C/C++ compiler and IPPROOT environment variable. Set the target environment according to required OS target and toolchain.

Basically, you need to set SYSROOT and GNU_PREFIX environment variables and to update PATH variable to be able to execute bimary tools from the selected toolchain.

Then, execute build script

 > build_iss_win.bat

For example, to build Intel IPP examples for Wind River* Linux* target OS you need to set the following environment variables in your command line window

//On Windows* host
//For 32-bit target:
set PATH=C:\<WRL6_path>\x86_64-linux\usr\bin\i586-wrs-linux;%PATH%
set SYSROOT=C:\<WRL6_path>\qemux86
set GNU_PREFIX=i586-wrs-linux-

//For 64-bit target:
set PATH=C:\<WRL6_path>\x86_64-linux\usr\bin\x86_64-wrs-linux;%PATH%
set SYSROOT=C:\<WRL6_path>\qemux86-64
set GNU_PREFIX=x86_64-wrs-linux-
Note
Look into corresponding Intel® System Studio Documentation on how to setup environment for different target OS.

Build Examples On Linux* for Intel® System Studio

Execute <ISS install dir>/bin/compilervars.sh script for target architecture (ia32 or intel64) to obtain PATH variable update for Intel® C/C++ compiler and IPPROOT environment variable. Set the target environment according to required OS target and toolchain.

Basically, you need to set SYSROOT and GNU_PREFIX environment variables and to update PATH variable to be able to execute bimary tools from the selected toolchain. For example, for standard installation of Yocto Project v.1.7 on Linux* these environment variables should be

$ export GNU_PREFIX=i586-poky-linux-
$ export SYSROOT=/opt/poky/1.7/sysroots/i586-poky-linux

For Wind River* Linux* OS the required environment variables are

//On Linux* host
//For 32-bit target:
export PATH=<WRL6_path>/x86_64-linux/usr/bin/i586-wrs-linux:$PATH
export SYSROOT=<WRL6_path>/qemux86
export GNU_PREFIX=i586-wrs-linux-

//For 64-bit target:
export PATH=<WRL6_path>/x86_64-linux/usr/bin/x86_64-wrs-linux:$PATH
export SYSROOT=<WRL6_path>/qemux86-64
export GNU_PREFIX=x86_64-wrs-linux-

Then, execute build script

 $ ./build_iss_lin.sh
Note
Look into corresponding Intel® System Studio Documentation on how to setup environment for different target OS.

Buildspace Structure

Build projects or Makefiles generate necessary work files and final executable files in components/build/<configuration> directory.

Examples

The "-T <CPU>" command line option in all examples allows to turn on specific CPU architecture optimization. Together with timing this option helps to see the difference in performance of specific functions for different CPU architectures.

Option Description

-T SSE

Basic optimization for all IA-32/Intel® 64 processors (px/mx code)

-T SSE3

Intel® Streaming SIMD Extensions 3 (Intel® SSE3) optimization (t7/m7 code)

-T SSSE3

Intel® Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) optimization (s8/n8 code on Intel® Atom™ processors)

-T SSE41

Intel® Streaming SIMD Extensions 4.1 (Intel® SSE4.1) optimization (s8/n8 code)

-T SSE42

Intel® Streaming SIMD Extensions 4.2 (Intel® SSE4.2) optimization (s8/n8 code)

-T AES

Intel® AES New Instructions (Intel® AES-NI) optimization (p8/y8 code)

-T AVX

Intel® Advanced Vector Extensions (Intel® AVX) optimization (g9/e9 code)

-T AVX2

Intel® Advanced Vector Extensions 2 (Intel® AVX2) optimization (h9/l9 code)

-T <any other string>

In case of invalid string, the default optimization for current CPU used

After example execution, look at the Intel IPP library names displayed at the top lines of example’s output, it will help to determine which particular optimization was used.

Note
This option is absent in Intel IPP external threading example for Intel® Xeon Phi™ coprocessor because Intel® MIC Architecture is the only available architecture in this example.

Help on command line can be obtained using "-h" option.

Custom Dynamic Library Example

Intel IPP provides the following examples demonstrating how to build the custom dynamic library on different operating systems:

The following tables compare the contents and size of the packages for the end-user application linked with the custom dynamic library and the application linked with the standard Intel IPP dynamic libraries for Windows*, Linux* OS, and OS X*, respectively:

Windows*:

Application linked with Custom DLL

Application linked with Intel IPP dynamic libraries

ipp_test_app.exe

ipp_test_app.exe

ipp_custom_dll.dll

ippi-9.0.dll

ippig9-9.0.dll

ippih9-9.0.dll

ippip8-9.0.dll

ippipx-9.0.dll

ippis8-9.0.dll

ippiw7-9.0.dll

ipps-9.0.dll

ippsg9-9.0.dll

ippsh9-9.0.dll

ippsp8-9.0.dll

ippspx-9.0.dll

ippss8-9.0.dll

ippsw7-9.0.dll

ippcore-9.0.dll

Package size: 0.1 Mb

Package size: 79.3 Mb

Linux*:

Application linked with Custom SO

Application linked with Intel IPP dynamic libraries

ipp_test_app

ipp_test_app

libipp_custom.so

libippi.so.9.0

libippig9.so.9.0

libippih9.so.9.0

libippip8.so.9.0

libippipx.so.9.0

libippis8.so.9.0

libippiw7.so.9.0

libipps.so.9.0

libippsg9.so.9.0

libippsh9.so.9.0

libippsp8.so.9.0

libippspx.so.9.0

libippss8.so.9.0

libippsw7.so.9.0

libippcore.so.9.0

Package size: 0.1 Mb

Package size: 113.6 Mb

OSX*:

Application linked with Custom DYLIB

Application linked with Intel IPP dynamic libraries

ipp_test_app

ipp_test_app

libipp_custom.dylib

libippi-9.0.dylib

libippig9-9.0.dylib

libippih9-9.0.dylib

libippip8-9.0.dylib

libippie9-9.0.dylib

libippil9-9.0.dylib

libippiy8-9.0.dylib

libipps-9.0.dylib

libippsg9-9.0.dylib

libippsh9-9.0.dylib

libippsp8-9.0.dylib

libippse9-9.0.dylib

libippsl9-9.0.dylib

libippsy8-9.0.dylib

libippcore-9.0.dylib

Package size: 0.1 Mb

Package size: 128.2 Mb

As you can see from the tables' data, use of Intel IPP custom dynamic library provides the following benefits:

  • Less package size comparing to linking with the standard dynamic libraries.

  • No need to relink the application with the new version of Intel IPP. It is sufficient to build the Intel IPP custom dynamic library from the new Intel IPP version and substitute the library into the application package.

Custom DLL Example for Windows*: custom_dll

The ipp_custom_dll example demonstrates how to build the Intel IPP custom dynamic library for an application on Windows* OS.

Example Structure

Intel IPP Custom DLL example consists of the following parts:

  • ipp_custom_dll : custom DLL built from the Intel IPP single-threaded static libraries

  • ipp_test_app: the project test application linked with the custom DLL

How to build custom DLL

To build Intel IPP custom DLL, follow the instructions below:

  1. Define the exact list of Intel IPP functions that custom DLL should export (see ipp_custom_dll\src\custom_dll\export.txt file as an example).

  2. Develop the DllMain function and call the IppInit() function to initialize the Intel IPP dispatcher at the loading stage (see ipp_custom_dll\src\custom_dll\main.cpp as an example).

  3. Build the DLL using the export list, DllMain function and links with the Intel IPP single-threaded static libraries:

images/core/custom_dll_1.png

How to use custom DLL

Link the application with custom DLL as any another DLL using import library. See the test_app subproject as an example: images/core/custom_dll_2.png

Custom SO Example for Linux*: ipp_custom_so

The ipp_custom_so example demonstrates how to build the Intel IPP custom shared library for an application on Linux* OS.

Example Structure

The Intel IPP custom SO example consists of the following parts:

  • ipp_custom_so: custom shared library built from the Intel IPP single-threaded static libraries

  • ipp_test_app: the project test application linked with the custom SO

How to build custom SO

To build the custom SO, follow the instructions below:

  1. Define the exact list of Intel IPP functions that the custom SO should export (see ipp_custom_so\src\custom_so\export.txt file as an example). Note: format of export file may be different - it depends on linker tool.

  2. Develop the _init() function and call the IppInit() function to initialize the Intel IPP dispatcher at the loading stage (see ipp_custom_so\src\custom_so\main.c as an example).

  3. Build the custom shared library using the export list, _init() function, and links with the Intel IPP single-threaded static libraries.

How to use custom SO

Link the application with custom SO as any another shared libraries.

Custom DYLIB Example for OS X*: ipp_custom_dylib

The ipp_custom_dylib example demonstrates how to build the Intel IPP custom dynamic library for an application on OS X*.

Example Structure

The Intel IPP Custom DYLIB example consists of the following parts:

  • ipp_custom_dylib: custom dynamic library built from the Intel IPP single-threaded static libraries

  • ipp_test_app: the project test application linked with the custom DYLIB

How to build custom DYLIB

To build the custom DYLIB, follow the instructions below:

  1. Define the exact list of Intel IPP functions that the custom DYLIB should exports (see ipp_custom_dylib\src\custom_dylib\export.txt file as an example).

  2. Develop the constructor function and call the IppInit() function to initialize the Intel IPP dispatcher at the loading stage (see ipp_custom_dylib\src\custom_dylib\main.c as an example).

  3. Build the custom dynamic library using the export list, constructor function, and links with the Intel IPP single-threaded static libraries.

How to use custom DYLIB

Link the application with custom DYLIB as any another dynamic libraries.

Multi-thread Image Resize Example: ipp_resize_mt

This Intel IPP example shows how to use the ippiResize functionality in single- and multi-thread mode. For external multi-threading, Intel TBB tool is used, specificially, the parallel_for loop functionality. The multi-threading mode works if the project is built with Intel TBB support. See Setting the Build Environment for the details.

CTBBResize tbbResize(&resizer);
blocked_range2d<Ipp32u, Ipp32u> range(0, dstData.m_iHeight, iGrainH, 0, dstData.m_iWidth, iGrainW);

CTBBResize Class

The core of ipp_resize_mt example is CTBBResize class declared as

class CTBBResize
{
public:
    CTBBResize(CResizer *pResizer)
    {
        m_pResizer = pResizer;
    }
    ~CTBBResize() {};

    void operator()(blocked_range2d<Ipp32u, Ipp32u> &r) const
    {
        m_pResizer->Resize(r.cols().begin(), r.rows().begin(), r.cols().end() - r.cols().begin(),
                    r.rows().end() - r.rows().begin());
    }

private:
    CResizer   *m_pResizer;
};

Actually, this class does nothing, but provides operator() functionality using the blocked_range2d Intel TBB class argument. The operator() function determines the boundaries of image tile assigned to a particual instance of an object, such as r.cols().begin(), r.rows().end(), etc, and call Resize function of CTBBResize::m_pResizer data member. CTBBResize is designed as much simple as possible to hide all Intel IPP specifics in the CResizer class.

CResizer Class

CResizer is an auxiliary class declared as

class CResizer
{
public:
    CResizer();
    ~CResizer();
    void Init(Image *pSrcData, Image *pDstData, IppiInterpolationType interpolation, Ipp32u iThreads = 1);
    void Resize(int iRangeX, int iRangeY, int iRangeWidth, int iRangeHeight, Ipp32u iThread = 0);
private:
    Image  *m_pSrcData;
    Image  *m_pDstData;
    Ipp32u  m_iThreads;
    IppiInterpolationType m_interpolation;

    IppiResizeSpec_32f *m_pSpec;
    Ipp32s              m_iSpecSize;
    Ipp8u              *m_pInitBuffer;
    Ipp32s              m_iInitSize;
};

Prior to actual work, you need to initialize the CResizer object by calling the CResizer::Init function. Within Init function the ippiResize context structure (IppiResizeSpec_32f *CResize::m_pSpec) is allocated and initialized. This structure is used further in CResizer::Resize function. The space allocation is done with respect to image size and resizing algorithm.

Command Line Options

The command line options for ipp_resize_mt are:

Usage: ipp_resize_mt [-i] InputFile [[-o] OutputFile] [Options]
Options:
  -i <1 arg>  input file name
  -o <1 arg>  output file name
  -r <2 args> destination resolution (width height)
  -k          do not keep aspect ratio
  -p <1 arg>  interpolation:
              0 - Nearest
              1 - Linear (default)
              2 - Cubic
              3 - Lanczos

  -s          suppress window output
  -w <1 arg>  minimum test time in milliseconds
  -l <1 arg>  number of loops (overrides test time)
  -T <1 arg>  target Intel IPP optimization (SSE, SSE2, SSE3, SSSE3, SSE41, SSE42, AES, AVX, AVX2)
  -h          print help and exit

Currently ipp_resize_mt example works only with bitmap (BMP) files.

The options are as follows:

Option Description

-i <arg>

The name of the source image BMP file to process. Here "-i" is optional.

-o <arg>

The name of the output BMP file with corners detected. "-o" is optional.

-r <2 args>

Destination image resolution in format "-r width height". Note: without "-k" option (see below) the example tries to keep aspect ratio of the source image. It means that without "-k" only first "-r" argument, width, is effective. The height of destination image is calculated as destination_height = destination_width * source_aspect_ratio, where source_aspect_ratio is source_height / source_width, i.e. destination image keeps source image proportions.

-k

Do not kee aspect ratio. With this option the example creates destination image with resolution specified in "-r width height" command line option.

-t <1 arg>

The number of execution threads that will be used in resizing. By default, "-t 0" number of threads is equal to the number of CPU cores. (*)

-p <1 arg>

Resizing interpolation method. One-digit number from 0 to 3.

-w <1 arg>

This option specifies the execution time. The execution time affects performance measurement accuracy. This option has similar effect with "-l <arg>" option (see below). Time is specified as a floating point value in milliseconds (e.g. "-w 1000" for one second duration).

-l <1 arg>

This option specifies the number of internal execution loops. This option also affects performance measurement accuracy. <arg> is an integer positive value.

-h

This option shows the above help message.

Note
(*) "-t <1 arg>" option will be effective and visible in usage message only if ipp_resize_mt example is built with Intel TBB support. Otherwise, the sequential resize will be used.

Example Output

The output messages during example execution show the number of resizing threads used, source and destination image resolutions, the number of loops executed, and execution time in seconds and CPE (CPU clocks per image element) like in the example below:

Threads: 4
Src size: 1200x900
Dst size: 1024x768

Loops:      1
Time total: 0.003735s
Loop avg:   0.003735s
CPE total:  2.917901
CPE avg:    2.917901

Image Linear Transform Example: ipp_fft

Intel IPP image linear transform (ipp_fft) example shows how to use the Intel IPP image linear transforms: Fourier transforms (fast and discrete) and discrete cosine transform (DCT).

Example Structure

The example contains 3 classes - FFT, DFT and DCT - inherited from the base class Transform. Each class implements its own, specific Init, Forward, and Inverse methods with corresponding functionalities.

Additionally, the ipp_fft example has source code for performance mesurement and performance statistics output.

Command Line Options

To display help for the ipp_fft command line options, call the example with "-h" option.

Usage: ipp_fft [-i] InputFile [[-o] OutputFile] [Options]
Options:
  -i <1 arg>  input file name
  -o <1 arg>  output file name
  -d <1 arg>  save coefficients matrix
  -x <1 arg>  create coefficients map image
  -m <1 arg>  image transform method: FFT (default), DFT, DCT
  -b <2 args> apply transform by blocks (width height, 0 0 by default)
              block size will be set to first lower power of 2 value

  -n <1 arg>  algorithm mode: 0 - default, 1 - fast, 2 - accurate
  -s          suppress window output
  -w <1 arg>  minimum test time in milliseconds
  -l <1 arg>  number of loops (overrides test time)
  -T <1 arg>  target Intel IPP optimization (SSE, SSE2, SSE3, SSSE3, SSE41, SSE42, AES, AVX, AVX2)
  -h          print help and exit

All command line switches except source image file name are optional.

The options are as follows:

Option Description

-i <1 arg>

The name of the source image BMP file to process. Here "-i" is optional.

-o <1 arg>

The name of the output BMP file with corners detected.

-d <1 arg>

The name of a binary file to save the transformation coefficients. They are saved as 32-bit floating-point values.

-x <1 arg>

The name of the output file for transformation coefficients. The coefficients are saved normalized in bitmap BMP format.

-m <1 arg>

Linear transform method to use. Examples: "-m FFT", "-m DFT", "-m DCT".

-b <2 args>

Use image blocks (tiles) to linear transformation. By default, block is equal to the whole image. Blocks are defined by "-b width height" argument ("-b 128 128" for example). Using of image tiles for transformation can improve overall performance, because with small tiles all required data is stored in CPU cache.

-n <1 arg>

Linear transform algorithm hint (default, fast, or accurate).

-w <1 arg>

This option specifies the execution time. The execution time affects performance measurement accuracy. This option has similar effect with "-l <arg>" option (see below). Time is specified as a floating point value in milliseconds (e.g. "-w 1000" for one second duration).

-l <1 arg>

This option specifies the number of internal execution loops. This option also affects performance measurement accuracy. <arg> is an integer positive value.

-T <1 arg>

Target Intel IPP optimization.

-h

This option shows the above help message.

External Threading Example: ipp_thread

As internal threading is deprecated in Intel IPP functions (see Intel® IPP deprecation strategy at http://software.intel.com/sites/products/ipp-deprecated-features-feedback), it is important to know how generic Intel IPP function can be threaded externally. This is the purpose of the Intel IPP external threading example (ipp_thread).

This example shows threading of image harmonization filter. The harmonization filter steps are

  • Generate gray image (src);

  • Filterbox 7x7 with anchor 1,1 (tmp = filterbox(src));

  • Multiply by constant 255 (tmp = tmp * 255);

  • Subtract from source (dst = tmp - src);

  • Multiply by constant 205 (dst = dst * 205);

  • Threshold < 6 or > 250 (dst = threshold(6, 250).

The example demonstrates 3 methods of harmonization filtering:

  • Direct (serial) execution of above steps on the whole image.

  • Using native threads. On Windows*, WinAPI threads are used, on Linux* OS and OS X* pthreads are used.

  • Intel TBB usage model.

The threading methods use simple data-parallel aproach when source image is splitted into slices and each slice is processed in a separate thread in parallel mode. It is possible because there is no data dependency between the slices.

Example Structure

In general, the ipp_thread example consists of main function and three processing functions - HarmonizeSerial, HarmonizeNativeParallel, and HarmonizeTBBParallel.

Before the source image gets to filtering, the source data is copied into a bigger image to set up border around source image data (necessary for ippiFilterBox_8u_C1R function).

Threading Using Native Threads

In the HarmonizeNativeParallel function, thread control is organized using VM library functions to hide operating system specifics.

The main thread processing function is named HarmonizeNativeSlice. It does exactly the same job as the HarmonizeSerial function but on a source image slice only.

Synchronization of threads is done using mutex and event objects. When the last thread is finishing, it sets vm_event* ThreadParam::pLastThreadStop flag notifying the thread controlling function that it can continue execution.

Threading Using Intel® TBB

The HarmonizeTBBParallel function is quite simple. The Intel TBB class for parallel_for construction is

class SliceHarmonize {
public:
    SliceHarmonize(int numIter, Image &srcBorder, Image &tmpImage,
        IppiSize srcRoi, Image &srcImage, Image &dstImage) :
            m_numIter(numIter),
            m_srcBorder(srcBorder),
            m_tmpImage(tmpImage),
            m_srcRoi(srcRoi),
            m_srcImage(srcImage),
            m_dstImage(dstImage) { }
    void operator() (const tbb::blocked_range<int>& height) const;

protected:
    Image           &m_srcBorder;
    Image           &m_tmpImage;
    Image           &m_srcImage;
    Image           &m_dstImage;
    int             m_numIter;
    IppiSize        m_srcRoi;
};

Filtering is done in the SliceHarmonize::operator() function the same way as in serial and native methods on slices defined by the tbb::blocked_range<int>& height function argument.

In Intel TBB model, thread synchronization is done automatically.

Command Line Options

Command line options can be obtained by "-h" option as follows:

Usage: ipp_thread [-i] InputFile [[-o] OutputFile] [Options]
Options:
  -i <1 arg> input file name
  -o <1 arg> output file name
  -m <1 arg> threading method: tbb (default), native (*)
  -t <1 arg> number of threads (0 - auto, 0 by default)
  -s         suppress window output
  -w <1 arg> minimum test time in milliseconds
  -l <1 arg> number of loops (overrides test time)
  -T <1 arg> target Intel IPP optimization (SSE, SSE2, SSE3, SSSE3, SSE41, SSE42, AES, AVX, AVX2)
  -h         print help and exit

The options are as follows:

Option Description

-i <1 arg>

The name of the source image BMP file to process. Here "-i" is optional.

-o <1 arg>

Output bitmap BMP file name (optional).

-m <1 arg>

Threading method. "-m tbb" starts parallelization using Intel TBB, "-m native" starts parallelization using native OS methods. (*)

-t <1 arg>

The number of threads to be used in paralelization.

-w <1 arg>

This option specifies the execution time. The execution time affects performance measurement accuracy. This option has similar effect with "-l <arg>" option (see below). Time is specified as a floating point value in milliseconds (e.g. "-w 1000" for one second duration).

-l <1 arg>

This option specifies the number of internal execution loops. This option also affects performance measurement accuracy. <arg> is an integer positive value.

-T <1 arg>

Target Intel IPP optimization.

-h

This option shows the above help message.

Note
(*) "-m <1 arg>" option will be effective and visible in usage message only if ipp_thread example is built with Intel® TBB support. Otherwise, the option will not be visible and native threading will be used.

Multi-threading Example for Intel® Xeon Phi™ Coprocessor: ipp_thread_mic

Starting with Intel® C++ Composer 2013, the Intel® software development tools support Intel® Xeon Phi™ manycore processor accelerators. The Intel® Manycore Processor Software Stack (Intel® MPSS) SDK contains libraries and cross-compilation tools to prepare native and "offload" executables for Intel® Many Integrated Core Architecture (Intel® MIC Architecture).

Intel® IPP libraries along with IA-32 and Intel® 64 versions contain native Intel® MIC Architecture libraries, which allow to build the applications running totally (natively) or particularly (offload) on Intel® MIC Architecture. Note, that these libraries are 64-bit libraries for Intel® 64 architecture only.

Native Applications

The process of building Intel® MIC Architecture application is similar to building ordinary Intel® IPP-based applications. The differences are in additional compiler options ("-mmic" for Linux* OS and "-Qmic" for Windows*) and in Intel IPP libraries used during linking phase.

In Intel® C++ Compiler the required directories with Intel® MIC Architecture-specific libraries are included automatically with Intel® MIC-specific compiler options. For example, suppose that we have the test.c C-source code file with Intel IPP functions calls. With "-mmic" and "-ipp=common" options the final Linux* compiler options are:

$ ./icc -# -mmic -ipp=common test.c 2>&1 | grep ipp
    -I/opt/compilers_and_libraries_2016.x.yyy/linux/ipp/include \
    /opt/compilers_and_libraries_2016.x.yyy/linux/ipp/include \
    "-mGLOB_options_string=-# -mmic -ipp=common" \
    -L/opt/compilers_and_libraries_2016.x.yyy/linux/ipp/lib/intel64_lin_mic \
    -lippcv \
    -lippch \
    -lippcc \
    -lippdc \
    -lippi \
    -lipps \
    -lippvm \
    -lippcore \
    -lippcv \
    -lippch \
    -lippcc \
    -lippdc \
    -lippi \
    -lipps \
    -lippvm \
    -lippcore \

Here,

  • "-I/opt/compilers_and_libraries_2016.x.yyy/linux/ipp/include" compiler option is used for additional ".h" files search path;

  • "-lippcore" option is to add libippcore.a and libippcore.so library files;

  • "-L/opt/compilers_and_libraries_2016.x.yyy/linux/ipp/lib/intel64_lin_mic" is for additional library search path;

  • "x" and "yyy" are placeholders for specific major and minor version numbers.

The linker options correspond to x86_64-k1om-linux-ld cross-linker, one of the Intel® MPSS cross-compilation tools. Your Intel® Parallel Studio XE Composer Edition install directory may differ from the one mentioned in the example above.

Code Example

This chapter explains how to build and execute the following example for Intel® Xeon Phi™ on Linux* OS.

#include <stdio.h>
#include "ipps.h"
const char* test_str = "Hello Intel(R) IPP on Intel(R) Xeon Phi(TM)!";
int main()
{
    int len;
    Ipp8u* ipp_buf;

    len = strlen((void*)test_str);
    ipp_buf = ippsMalloc_8u(len+1);
    ippsCopy_8u((const Ipp8u*)test_str, ipp_buf, len);
    ipp_buf[ len+1 ] = 0;
    printf("Test string: %s\n", ipp_buf);
    ippsFree(ipp_buf);
}

To build this example, you need to execute the following command on Linux* OS:

  $ icc -mmic -ipp=common test.c

The output executable file a.out is Intel® MIC Architecture-native application task. To execute this application on Intel® Xeon Phi™, you need to copy it to the coprocessor file system:

  $ scp a.out mic0:

Then, go to Intel® Xeon Phi™ coprocessor and execute it:

  $ ssh mic0:
  $ ./a.out
  Test string: Hello Intel(R) IPP on Intel(R) Xeon Phi(TM)!
  $

Offload Applications

There is another Intel® MIC Architecture programming technique based on Intel® C/C++ Compiler offload feature (see Intel® Many Integrated Core Architecture (Intel® MIC Architecture) chapter).

From the user point of view, the offload compilation and execution means that you can declare a part of your application as offload, i.e. oriented for execution off the central CPU, build the corresponding application and start it for execution on computer "host" system.

The offload run-time system and compiler pragma provide automatic methods to transfer Intel® MIC Architecture-targeted executable parts of binary code and related data to/from Intel® Xeon Phi™ coprocessor. Basically, you only need to specify which parts of your application (including data) are targeted for Intel® Xeon Phi™.

In practice, however, it is more complicated. The main problem is that offload data transferring interface is intended for simple, ordinal objects (scalar ordinal variables, plain arrays and so on). This means that the variables used to transfer data between host CPU (memory) and Intel® Xeon Phi™ should be simple, countable, understandable for compiler and bitwise-copyable (i.e. should be treated equally on both sides of CPU-Intel® Xeon Phi™ interconnection).

This means that in general you cannot have, for example, C++ class in which the part of function/data members are offload and the rest of class is not.

class foo {
public:
    foo();
    ~foo();

    void host_function();
    __declare(target(mic)) void mic_function();
private:
    int cpu_hosted;
    __declare(target(mic)) int mic_hosted;
};

Currently, the above class is impossible in offload environment because you cannot describe the necessary data transfers in terms of compiler "#pragma offload" clauses. If you want to have C++ classes in offload application, you need to place the corresponding objects either on host (CPU) side of application or on Intel® Xeon Phi™ side.

Compiler Command Line for Offload Compilation

The Intel® C++ Compiler recognizes "offloaded" source code and undertakes specific action to build the complete application with "offload" feature. If you want to build some applications manually, or, if this application requires Intel® MIC Architecture-specific libraries, you can use the command below.

Note
Intel® C++ Compiler has a specific switch to pass the desired options to offload linker. For example, on Linux* OS the following command line can be used for application that calls both Intel IPP functions for CPU and Intel IPP functions for Intel® MIC Architecture (in offload parts of code):
  $ icc -I$IPPROOT/include -L$IPPROOT/lib/intel64_lin -lipps -lippcore \
        -qoffload-option,mic,compiler,"-L$IPPROOT/lib/intel64_lin_mic -lippi -lipps -lippcore" test.cpp

This command uses different sets of Intel IPP libraries:

  • libipps.a and libippcore.a from $IPPROOT/lib/intel64_lin for CPU part;

  • libippi.a, libipps.a and libippcore.a from $IPPROOT/lib/intel64_lin_mic for offload code.

If your application uses Intel IPP function calls only in offload parts, you do not need to link to Intel IPP libraries for CPU and the command line is simpler:

  $ icc -I$IPPROOT/include -qoffload-option,mic,compiler,"-L$IPPROOT/lib/intel64_lin_mic -lippi -lipps -lippcore" test.cpp

In the command line above only Intel® MIC Architecture libraries are specified for linking.

On Windows, the corresponding Intel® C++ Compiler option is:

  > icl -I"%IPPROOT%\include" \
       /Qoffload-option,mic,link,"-L\"%IPPROOT%\"/lib/intel64_lin_mic -lippi -lipps -lippcore" test.cpp
Note
The compiler option syntax in quotes is of Linux style, because the cross-compilation tools for Intel® MIC Architecture on Windows are still Linux-like.

Modifying example for Offload Mode

To adjust the example provided in External Threading Example (ipp_thread) for use in offload mode, complete the following steps:

  1. Remove Intel TBB parts to make the example more clear and C-like. To do this, delete the source code lines within the following constructions:

#ifdef USE_TBB
...
#endif
  1. Separate the Intel® MIC Architecture and host CPU source Code. Intel® C++ Compiler has a special macro definition __MIC__ to distinguish between CPU and Intel® MIC Architecture compilation phases. It enables co-existence of both source code parts in a single source code file. For example:

...
#if defined(__MIC__)
static int mic_specific_variable;
#endif
...
#if !defined(__MIC__)
static int host_specific_variable;
#endif

Remember that offload compiler performs two source file compilation passes:

  • Compiles source file for host CPU with __MIC__ macro undefined and produces ordinary object file;

  • Compiles the same source file as for Intel® MIC Architecture with __MIC__ macro defined and produces Intel® MIC Architecture operating system’s object file with <obj_file_name>MIC.o name.

In our example we are going to have all Intel IPP functions and data on Intel® Xeon Phi™ side, so we need to declare Intel IPP specifics as offload. It can be done in the following way:

#pragma offload_attribute(push, target(mic))
#include "ipp.h"
#pragma offload_attribute(pop)

Then, we want to put the example classes Harmonize and HarmonizeVM to Intel® Xeon Phi™ side by

#pragma offload_attribute(push, target(mic))
class Harmonize
{
public:
    Harmonize()
...
};
class HarmonizeVM
{
...
};
#pragma offload_attribute(pop)

These classes should not be visible from host side of application, otherwise you can face problems with offload compiler behavior.

If so, we need to move

int main()
...
    Harmonize *pHarmonize = 0;

variable from main() local storage to static variable space of coprocessor.

__declspec(target(mic)) Harmonize *pHarmonize = 0;
int main()
...
  1. Add offload functions with simple interface and arguments to control harmonization objects from the host:

#if defined(__MIC__)
// Definition of function to create harmonization objects on coprocessor
__declspec(target(mic)) void Create(ParallelMethod par_method, int threads, unsigned int width,
                        unsigned int height, unsigned int step, unsigned int samples)
{
    switch(par_method) {
    case PAR_SEQ:
        Create<Harmonize>(threads, width, height, step, samples);
        break;
    case PAR_VM:
        Create<HarmonizeVM>(threads > 0? threads : VM_THREADS, width, height, step, samples);
        break;
    }
}
#else
// We need to declare "Create" function as "offload" function on host side
__declspec(target(mic)) void Create(ParallelMethod par_method, int threads, unsigned int width,
                        unsigned int height, unsigned int step, unsigned int samples);
#endif
...
int main()
...
#pragma offload target(mic) in(iThreads, mic_width, mic_height, mic_step, mic_samples)
            Create(PAR_SEQ, iThreads, mic_width, mic_height, mic_step, mic_samples);
...

Then we will need the functions to:

  • Pass image parameters (host memory address, size) to Intel® Xeon Phi™ (function SetImages);

  • Start harmonization on Intel® Xeon Phi™ (function HarmonizeImage);

  • Clear Harmonize objects on Intel® Xeon Phi™ to release dynamic memory (function ClearHarmonizeObject).

All these functions have simple C-like parameters like ordinal variables and plain arrays.

Data Transfer Hints

One of the factors that mostly affects heterogeneous application performance is transfer rate of user data over system bus between main computer memory and GPU (in our case, Intel® Xeon Phi™ coprocessor). While the transfer itself has quite good rate - tens of GBs per second - the initiation of transfer operation is very expensive. Therefore one of the most important tasks in heterogeneous development is to minimize data transfer transactions.

In case of offload compilation the data transfers between host and Intel® Xeon Phi™ are not always visible, the compiler generates the data transfers, memory buffer allocations and de-allocations implicitly and it is important to keep control over this activity.

For example, assume the following pseudo-code program:

// Offload and host part initializations
...
// Offload objects creation
#pragma offload(target(mic)) out(status)
    status = CreateObjects();
...
#pragma offload(target(mic)) in(host_image_src_ptr : length(src_size)) in(src_src)
    status = SetImages(host_image_src_ptr, src_size);
...
    for(int i=0; i < num_loops; i++) {
#pragma offload(target(mic)) in(dst_size) out(host_image_dst_ptr : length(dst_size)) out(status)
        status = HarmonizeImage(host_image_dst_ptr, dst_size);
        // host_image_dst_ptr buffer is not really used at host side
        ...
    }

The problems with data transfers in the example are:

  • The offload pragma with in clause makes the compiler to generate hidden code to create temporary buffer on coprocessor side and organize the data transfer between host buffer and temporary buffer on coprocessor. Basically, this means that the __declspec(target(mic)) SetImages(void* ptr_from_host, int buf_size) function is executed on coprocessor with temporary buffer address in ptr_from_host parameter. You need to copy data from temporary buffer to real buffer previously allocated on coprocessor side.

  • If you copy the buffer address somewhere to coprocessor static storage in order to use it later,

__declspec(target(mic)) static void* saved_ptr;

__declspec(target(mic)) SetImages(void* ptr_from_host, int buf_size)
{
    saved_ptr = ptr_from_host;
}
__declspec(target(mic)) OtherMicFunction()
{
    local_mic_function(saved_ptr);
}

you will get segmentation violation fault, because after SetImages function exits, the ptr_from_host buffer does not exist anymore, unless you undertake special actions like

#pragma offload(target(mic)) in(host_image_src_ptr : length(src_size) free_if(0)) in(src_src)
    status = SetImages(host_image_src_ptr, src_size);
  • The second problem is in the statement

#pragma offload(target(mic)) in(dst_size) out(host_image_dst_ptr : length(dst_size)) out(status)
        status = HarmonizeImage(host_image_dst_ptr, dst_size);
        // host_image_dst_ptr buffer is not really used at host side

Even if you do not use host_image_dst_ptr at the host site, the compiler generates buffer allocation for transfer data from Intel® Xeon Phi™ to host_image_dst_ptr buffer. The loop, that should accumulate the time of Intel® MIC Architecture function execution includes also the time of data transfer, which you may not need. The decision could be to not specify output buffer in HarmonizeImage call, but to move final data transfer from Intel® Xeon Phi™ to the last function ClearHarmonizeObject. In this case another helpful decision could be to specify output buffer address as an argument to function SetImages with attribute free_if(0), which forces offload compiler to keep temporary buffer and its pointer.

The final SetImages function call from host looks like

#pragma offload target(mic) in(pmicSrc: length(srcData.m_iBufferSize) RETAIN) \
                            out(pmicDst: length(dstData.m_iBufferSize) RETAIN)
        SetImages(pmicSrc, pmicDst);

where RETAIN is a macro which name is more understandable than free_if(0).

#define RETAIN  free_if(0)

Finally, we need to transfer the harmonized data from Intel® Xeon Phi™ to the host. It is done with last function call

// Get output image from coprocessor
#pragma offload target(mic) out(pmicDst: length(dstData.m_iBufferSize) FREE)
        ClearHarmonizeObject(pmicDst);

FREE here is the macro

#define FREE    free_if(1)

Including VM Functions into Main Function Source File

In fact, it is not necessary to have all offloaded functions in a single source file. You can split Intel® MIC Architecture-related components in the following way:

File foo.c

__declspec(target(mic)) void foo()
{
    // Do something on Intel(R) Xeon Phi(TM)
}

File main.c

__declspec(target(mic)) void foo(); // External Intel(R) MIC function declaration
...
int main() {
...
#pragma offload target(mic)
    foo();                          // Call 'foo' at Intel(R) Xeon Phi(TM)
...
}

All functions in our example were combined into a single source code file to simplify build scripts for Intel IPP examples build system. The "vm_*" functions are the same as for the ipp_thread example with the following changes:

  • Attribute __declspec(target(mic)) added to function definitions;

  • Only Linux*/Unix* OS version of source code remains, because Intel® MIC Architecture compiler pass is Linux-based. The target operating system running on Intel® Xeon Phi™ coprocessor is Linux-clone;

  • All function definitions are under #if defined(__MIC__) conditional compilation statements.

Example Build

To build the Intel® MIC Architecture example, you need to execute build script as described in Build Procedure chapter with the following specifics.

Linux* OS

$ make CONF=release

Windows* OS

> nmake -f Makefile.mic.win

Example Execution

To execute the example, place a source bitmap file in the directory where ipp_thread_mic resides and execute the command specified in Command Line Options. The following has been changed in the command-line options:

  • "-T" option has been removed as Intel® MIC Architecture is the only CPU architecture available for this example;

  • "-m <1 arg>" option has been removed as "native" is currently the only one threading method.

The default number of threads on Intel® Xeon Phi™ is 32.

Technical Support

If you did not register your Intel® software product during installation, please do so now at the Intel® Software Development Products Registration Center. Registration entitles you to free technical support, product updates and upgrades for the duration of the support term.

For general information about Intel technical support, product updates, user forums, FAQs, tips and tricks, and other support questions, please visit (http://www.intel.com/software/products/support).

Note
If your distributor provides technical support for this product, please contact them rather than Intel.

For technical information about the Intel® IPP library, including FAQ’s, tips and tricks, and other support information, please visit the Intel® IPP forum: (http://software.intel.com/en-us/forums/intel-integrated-performance-primitives) and browse the Intel® IPP knowledge base: (http://software.intel.com/en-us/articles/intel-ipp-kb/all).