The DSDP package can also be run in parallel using multiple processors. In the parallel version, the Schur complement matrix is computed and solved in parallel. The parallel Cholesky factorization in PDSDP is performed using PLAPACK[9]. The parallel Cholesky factorization in PLAPACK uses a two-dimensional block cyclic structure to distribute the data. The blocking parameter in PLAPACK determines how many rows and columns are in each block. Larger block sizes can be faster and reduce the overhead of passing messages, but smaller block sizes balance the work among the processors more equitably. PDSDP used a blocking parameter of 32 after experimenting with several choices. Since PLAPACK uses a dense matrix structure, this version is not appropriate when the Schur complement matrix is sparse.
The following steps should be used to run an application in parallel using PDSDP.
A PDSDP executable can be used much like serial version of DSDP that reads SDPA files. Given a SDPA file such as truss1.dat-s, the command
mpirun -np 2 dsdp5 truss1.dat-s -log_summarywill solve the problem using two processors. Additional processors may also be used. This implementation is best suited for very large problems.
Use of PDSDP as a subroutine library is also very similar to the use of the serial version of the solver. The application must create the solver and conic object on each processor and provide each processor with a copy of the data matrices, objective vector, and options. At the end of the algorithm, each solver has a copy of the solution. The routines to set the data and retrieve the solution are the same.
The few differences between the serial and parallel version are listed below.
#include pdsdp5plapack.h
DSDP
solver object, the application should call
int PDSDPUsePLAPACKLinearSolver(DSDP dsdp,MPI_Comm comm);Most applications can set the variable comm to MPI_COMM_WORLD.
MPI_Comm_rank(MPI_COMM_WORLD,&rank); if (rank==0){ info = DSDPSetStandardMonitor(dsdp); }
An example of the usage is provided in DSDPROOT/pdsdp/plapack/readsdpa.c. Scalability of medium and large-scale problems has been achieved on up to 64 processors. See [1] for more details.
Source code that uses the parallel conjugate gradient method in PETSc to solve the linear systems is also included in the distribution.