Plugin Scripts for Generic Schedulers

The generic scheduler interface provides complete flexibility to configure the interaction of the MATLAB^® client, MATLAB workers, and a third-party scheduler. The plugin scripts define how MATLAB interacts with your setup.

The following table lists the supported plugin script functions and the stage at which they are evaluated:

File Name	Stage
`independentSubmitFcn.m`	Submitting an independent job
`communicatingSubmitFcn.m`	Submitting a communicating job
`getJobStateFcn.m`	Querying the state of a job
`canceJobFcn.m`	Canceling a job
`cancelTaskFcn.m`	Canceling a task
`deleteJobFcn.m`	Deleting a job
`deleteTaskFcn.m`	Deleting a task
`postConstructFcn.m`	After creating a `parallel.cluster.Generic` instance

These plugin scripts are evaluated only if they have the expected file name and are located in the folder specified by the PluginScriptsLocation property of the cluster. For more information about how to configure a generic cluster profile, see Configure Using the Generic Scheduler Interface (MATLAB Parallel Server).

Note

The independentSubmitFcn.m must exist to submit an independent job, and the communicatingSubmitFcn.m must exist to submit a communicating job.

Sample Plugin Scripts

To support usage of the generic scheduler interface, plugin scripts are available for the following third-party schedulers:

Each installer provides scripts for three possible submission modes:

Shared – The client can submit directly to the scheduler, and the client and the cluster nodes (or machines) have a shared file system.
Remote – The client and cluster nodes have a shared file system, but the client machine cannot submit directly to the scheduler, such as when the client utilities of the scheduler are not installed. This mode uses the ssh protocol to submit commands to the scheduler using a remote host.
Nonshared – The client and cluster nodes do not have a shared file system. This mode uses the ssh protocol to submit commands to the scheduler using a remote host, and it uses the sftp protocol to copy job and task files to the cluster file system.

Each submission mode has its own subfolder within the installation folder. This subfolder contains a README file that provides specific instructions on how to use the scripts. Before using the scripts, decide which submission mode describes your network setup.

To run the installer, download the appropriate support package for your scheduler, and open it in your MATLAB client. The installer includes a wizard to guide you through creating a cluster profile for your cluster configuration.

If you want to customize the behavior of the plugin scripts, you can set additional properties, such as AdditionalSubmitArgs. For more information, see Customize Behavior of Sample Plugin Scripts (MATLAB Parallel Server).

If your scheduler or cluster configuration is not supported by one of the support packages, it is recommended that you modify the scripts of one of these packages. For more information on how to write a set of plugin scripts for generic schedulers, see Writing Custom Plugin Scripts.

Wrapper Scripts

The sample plugin scripts use wrapper scripts to simplify the implementation of independentSubmitFcn.m and communicatingSubmitFcn.m. These scripts are not required, however, using them is a good practice to make your code more readable. This table describes these scripts:

File name	Description
`independentJobWrapper.sh`	Used in `independentSubmitFcn.m` to embed a call to the MATLAB executable with the appropriate arguments. It uses environment variables for the location of the executable and its arguments. For an example of its use, see Sample script for a SLURM scheduler.
`communicatingJobWrapper.sh`	Used in `communicatingSubmitFcn.m` to distribute a communicating job in your cluster. This script implements the steps in Submit scheduler job to launch MPI process. For an example of its use, see Sample script for a SLURM scheduler.

Writing Custom Plugin Scripts

Note

When writing your own plugin scripts, it is a good practice to start by modifying one of the sample plugin scripts that most closely matches your setup (see Sample Plugin Scripts).

independentSubmitFcn

When you submit an independent job to a generic cluster, the independentSubmitFcn.m function executes in the MATLAB client session.

The declaration line of this function must be:

function independentSubmitFcn(cluster,job,environmentProperties)

Each task in a MATLAB independent job corresponds to a single job on your scheduler. The purpose of this function is to submit N jobs to your third-party scheduler, where N is the number of tasks in the independent job. Each of these jobs must:

Set the five environment variables required by the worker MATLAB to identify the individual task to run. For more information, see Configure the worker environment.
Call the appropriate MATLAB executable to start the MATLAB worker and run the task. For more information, see Submit scheduler jobs to run MATLAB workers.

Configure the worker environment. This table identifies the five environment variables and values that must be set on the worker MATLAB to run an individual task:

Environment Variable Name	Environment Variable Value
`PARALLEL_SERVER_DECODE_FUNCTION`	`'parallel.cluster.generic.independentDecodeFcn'`
`PARALLEL_SERVER_STORAGE_CONSTRUCTOR`	`environmentProperties.StorageConstructor`
`PARALLEL_SERVER_STORAGE_LOCATION`	If you have a shared file system between the client and cluster nodes, use `environmentProperties.StorageLocation` . If you do not have a shared file system between the client and cluster nodes, select a folder visible to all cluster nodes. For instructions on copying job and task files between client and cluster nodes, see Submitting without a Shared File System.
`PARALLEL_SERVER_JOB_LOCATION`	`environmentProperties.JobLocation`
`PARALLEL_SERVER_TASK_LOCATION`	`environmentProperties.TaskLocation{n}` for the `nth` task

Many schedulers support copying the client environment as part of the submission command. If so, you can set the previous environment variables in the client, so the scheduler can copy them to the worker environment. If not, you must modify your submission command to forward these variables.

Submit scheduler jobs to run MATLAB workers. Once the five required parameters for a given job and task are defined on a worker, the task is run by calling the MATLAB executable with suitable arguments. The MATLAB executable to call is defined in environmentProperties.MatlabExecutable. The arguments to pass are defined in environmentProperties.MatlabArguments.

Note

If you cannot submit directly to your scheduler from the client machine, see Submitting from a Remote Host for instructions on how to submit using ssh.

Sample script for a SLURM scheduler. This script shows a basic submit function for a SLURM scheduler with a shared file system. For a more complete example, see Sample Plugin Scripts.

function independentSubmitFcn(cluster,job,environmentProperties)
    % Specify the required environment variables.
    setenv('PARALLEL_SERVER_DECODE_FUNCTION', 'parallel.cluster.generic.independentDecodeFcn');
    setenv('PARALLEL_SERVER_STORAGE_CONSTRUCTOR', environmentProperties.StorageConstructor);
    setenv('PARALLEL_SERVER_STORAGE_LOCATION', environmentProperties.StorageLocation);
    setenv('PARALLEL_SERVER_JOB_LOCATION', environmentProperties.JobLocation);
    
    % Specify the MATLAB executable and arguments to run on the worker.
    % These are used in the independentJobWrapper.sh script.
    setenv('PARALLEL_SERVER_MATLAB_EXE', environmentProperties.MatlabExecutable);
    setenv('PARALLEL_SERVER_MATLAB_ARGS', environmentProperties.MatlabArguments);
    
    for ii = 1:environmentProperties.NumberOfTasks
        % Specify the environment variable required to identify which task to run.
        setenv('PARALLEL_SERVER_TASK_LOCATION', environmentProperties.TaskLocations{ii});
        % Specify the command to submit the job to the SLURM scheduler.
        % SLURM will automatically copy environment variables to workers.
        commandToRun = 'sbatch --ntasks=1 independentJobWrapper.sh';
        [cmdFailed, cmdOut] = system(commandToRun);
    end
end

The previous example submits a simple bash script, independentJobWrapper.sh, to the scheduler. The independentJobWrapper.sh script embeds the MATLAB executable and arguments using environment variables:

#!/bin/sh
# PARALLEL_SERVER_MATLAB_EXE - the MATLAB executable to use
# PARALLEL_SERVER_MATLAB_ARGS - the MATLAB args to use
exec "${PARALLEL_SERVER_MATLAB_EXE}" ${PARALLEL_SERVER_MATLAB_ARGS}

communicatingSubmitFcn

When you submit a communicating job to a generic cluster, the communicatingSubmitFcn.m function executes in the MATLAB client session.

The declaration line of this function must be:

function communicatingSubmitFcn(cluster,job,environmentProperties)

The purpose of this function is to submit a single job to your scheduler. This job must:

Set the four environment variables required by the MATLAB workers to identify the job to run. For more information, see Configure the worker environment.
Call MPI to distribute your job to N MATLAB workers. N corresponds to the maximum value specified in the NumWorkersRange property of the MATLAB job. For more information, see Submit scheduler job to launch MPI process.

Configure the worker environment. This table identifies the four environment variables and values that must be set on the worker MATLAB to run a task of a communicating job:

Environment Variable Name	Environment Variable Value
`PARALLEL_SERVER_DECODE_FUNCTION`	`'parallel.cluster.generic.communicatingDecodeFcn'`
`PARALLEL_SERVER_STORAGE_CONSTRUCTOR`	`environmentProperties.StorageConstructor`
`PARALLEL_SERVER_STORAGE_LOCATION`	If you have a shared file system between the client and cluster nodes, use `environmentProperties.StorageLocation` . If you do not have a shared file system between the client and cluster nodes, select a folder which exists on all cluster nodes. For instructions on copying job and task files between client and cluster nodes, see Submitting without a Shared File System.
`PARALLEL_SERVER_JOB_LOCATION`	`environmentProperties.JobLocation`

Submit scheduler job to launch MPI process. After you define the four required parameters for a given job, run your job by launching N worker MATLAB processes using mpiexec. mpiexec is software shipped with the Parallel Computing Toolbox™ that implements the Message Passing Interface (MPI) standard to allow communication between the worker MATLAB processes. For more information about mpiexec, see the MPICH home page.

To run your job, you must submit a job to your scheduler, which executes the following steps. Note that matlabroot refers to the MATLAB installation location on your worker nodes.

Request N processes from the scheduler. N corresponds to the maximum value specified in the NumWorkersRange property of the MATLAB job.
Call mpiexec to start worker MATLAB processes. The number of worker MATLAB processes to start on each host should match the number of processes allocated by your scheduler. The mpiexec executable is located at matlabroot/bin/mw_mpiexec.
The mpiexec command automatically forwards environment variables to the launched processes. Therefore, ensure the environment variables listed in Configure the worker environment are set before running mpiexec.
To learn more about options for mpiexec, see Using the Hydra Process Manager.

Note

For a complete example of the previous steps, see the communicatingJobWrapper.sh script provided with any of the sample plugin scripts in Sample Plugin Scripts. Use this script as a starting point if you need to write your own script.

Sample script for a SLURM scheduler. The following script shows a basic submit function for a SLURM scheduler with a shared file system.

The submitted job is contained in a bash script, communicatingJobWrapper.sh. This script implements the relevant steps in Submit scheduler job to launch MPI process for a SLURM scheduler. For a more complete example, see Sample Plugin Scripts.

function communicatingSubmitFcn(cluster,job,environmentProperties)
    % Specify the four required environment variables.
    setenv('PARALLEL_SERVER_DECODE_FUNCTION', 'parallel.cluster.generic.communicatingDecodeFcn');
    setenv('PARALLEL_SERVER_STORAGE_CONSTRUCTOR', environmentProperties.StorageConstructor);
    setenv('PARALLEL_SERVER_STORAGE_LOCATION', environmentProperties.StorageLocation);
    setenv('PARALLEL_SERVER_JOB_LOCATION', environmentProperties.JobLocation);
    
    % Specify the MATLAB executable and arguments to run on the worker.
    % Specify the location of the MATLAB install on the cluster nodes.
    % These are used in the communicatingJobWrapper.sh script.
    setenv('PARALLEL_SERVER_MATLAB_EXE', environmentProperties.MatlabExecutable);
    setenv('PARALLEL_SERVER_MATLAB_ARGS', environmentProperties.MatlabArguments);
    setenv('PARALLEL_SERVER_CMR', cluster.ClusterMatlabRoot);
    
    numberOfTasks = environmentProperties.NumberOfTasks;
    
    % Specify the command to submit a job to the SLURM scheduler which
    % requests as many processes as tasks in the job.
    % SLURM will automatically copy environment variables to workers.
    commandToRun = sprintf('sbatch --ntasks=%d communicatingJobWrapper.sh', numberOfTasks);
    [cmdFailed, cmdOut] = system(commandToRun);
end

getJobStateFcn

When you query the state of a job created with a generic cluster, the getJobStateFcn.m function executes in the MATLAB client session. The declaration line of this function must be:

function state = getJobStateFcn(cluster,job,state)

When using a third-party scheduler, it is possible that the scheduler can have more up-to-date information about your jobs than what is available to the toolbox from the local job storage location. This situation is especially true if using a nonshared file system, where the remote file system could be slow in propagating large data files back to your local data location.

To retrieve that information from the scheduler, add a function called getJobStateFcn.m to the PluginScriptsLocation of your cluster.

The state passed into this function is the state derived from the local job storage. The body of this function can then query the scheduler to determine a more accurate state for the job and return it in place of the stored state. The function you write for this purpose must return a valid value for the state of a job object. Allowed values are ‘pending’, ‘queued’, ‘running’, ‘finished’, or ‘failed’.