On Linux® and Macintosh operating systems, you can use an MPI build that differs from the one provided with Parallel Computing Toolbox™. This topic outlines the steps for creating an MPI build for use with the generic scheduler interface. If you already have an alternative MPI build, proceed to Use Your MPI Build.
Unpack the MPI sources into the target file system on your machine. For example, suppose
you have downloaded mpich2-distro.tgz
and want to unpack it into
/opt
for building:
# cd /opt # mkdir mpich2 && cd mpich2 # tar zxvf path/to/mpich2-distro.tgz # cd mpich2-1.4.1p1
Build your MPI using the enable-shared
option (this is vital, as you
must build a shared library MPI, binary compatible with MPICH2-1.4.1p1
for
R2013b to R2018b, or MPICH3.2.1
for R2019a and later). For example, the
following commands build an MPI with the nemesis
channel device and the
gforker
launcher.
#./configure -prefix=/opt/mpich2/mpich2-1.4.1p1 \ --enable-shared --with-device=ch3:nemesis \ --with-pm=gforker 2>&1 | tee log # make 2>&1 | tee -a log # make install 2>&1 | tee -a log
When your MPI build is ready, this stage highlights the steps to use it with a generic scheduler. To get your cluster working with a different MPI build, follow these steps.
Test your build by running the mpiexec
executable. The build should be
ready to test if its bin/mpiexec
and lib/libmpich.so
are
available in the MPI installation location.
Following the example in Build MPI,
/opt/mpich2/mpich2-1.4.1p1/bin/mpiexec
and
/opt/mpich2/mpich2-1.4.1p1/lib/libmpich.so
are ready to use, so you can
test the build with:
$ /opt/mpich2/mpich2-1.4.1p1/bin/mpiexec -n 4 hostname
Create an mpiLibConf
(Parallel Computing Toolbox) function to direct Parallel Computing Toolbox to use your new MPI. Write your mpiLibConf.m
to return the
appropriate information for your build. For example:
function [primary, extras] = mpiLibConf primary = '/opt/mpich2/mpich2-1.4.1p1/lib/libmpich.so'; extras = {};
The primary
path must be valid on the
cluster; and your mpiLibConf.m
file must be higher on the
cluster workers’ path than
. (Sending
matlabroot
/toolbox/parallel/mpimpiLibConf.m
as an attached file for this purpose does not work. You can
get the mpiLibConf.m
function on the worker path by either moving the file
into a folder on the path, or by having the scheduler use cd
in its command
so that it starts the MATLAB® worker from within the folder that contains the function.)
Determine necessary daemons and command-line options.
Determine all necessary daemons (often something like mpdboot
or
smpd
). The gforker
build example in this section uses
an MPI that needs no services or daemons running on the cluster, but it can use only the
local machine.
Determine the correct command-line options to pass to
mpiexec
.
To set up your cluster to use your new MPI build, modify your communicating job wrapper
script to pick up the correct mpiexec
. Additionally, there might be a stage
in the wrapper script where the MPI process manager daemons are launched.
The communicating job wrapper script must:
Determine which nodes are allocated by the scheduler.
Start required daemon processes. For example, for the MPD process manager this means
calling "mpdboot -f <nodefile>"
.
Define which mpiexec
executable to use for starting workers.
Stop the daemon processes. For example, for the MPD process manager this means calling
"mpdallexit"
.
For examples of communicating job wrapper scripts, see Sample Plugin Scripts (Parallel Computing Toolbox).