I was pleasantly surprised to read that PMI2 & SLURM is supported by Intel MPI in the 2017 release. I tested it, but it fails immediately on my setup. I'm using intel parallel studio 2017 update 4 & SLURM 15.08.13. A simple MPI-program doesn't work:
[donners@int1 pmi2]$ cat mpi.f90 program test use mpi implicit none integer ierr,nprocs,rank call mpi_init(ierr) call mpi_comm_size(MPI_COMM_WORLD,nprocs,ierr) call mpi_comm_rank(mpi_comm_world,rank,ierr) if (rank .eq. 0) then print *,'Number of processes: ',nprocs endif print*,'I am rank ',rank call mpi_finalize(ierr) end [donners@int1 pmi2]$ mpiifort mpi.f90 [donners@int1 pmi2]$ ldd ./a.out linux-vdso.so.1 => (0x00007ffcc0364000) libmpifort.so.12 => /opt/intel/parallel_studio_xe_2017_update4/compilers_and_libraries/linux/mpi/intel64/lib/libmpifort.so.12 (0x00002ad7432a9000) libmpi.so.12 => /opt/intel/parallel_studio_xe_2017_update4/compilers_and_libraries/linux/mpi/intel64/lib/release_mt/libmpi.so.12 (0x00002ad743652000) libdl.so.2 => /lib64/libdl.so.2 (0x00002ad744397000) librt.so.1 => /lib64/librt.so.1 (0x00002ad74459c000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00002ad7447a4000) libm.so.6 => /lib64/libm.so.6 (0x00002ad7449c1000) libc.so.6 => /lib64/libc.so.6 (0x00002ad744c46000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002ad744fda000) /lib64/ld-linux-x86-64.so.2 (0x00002ad743086000) [donners@int1 pmi2]$ I_MPI_PMI2=yes srun -n 1 --mpi=pmi2 ./a.out INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPID_Init:2104 Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(805): fail failed MPID_Init(1716)......: channel initialization failed MPID_Init(2104)......: fail failed srun: error: tcn1467: task 0: Exited with exit code 15 srun: Terminating job step 3270641.0 [donners@int1 pmi2]$ srun --version slurm 15.08.13-Bull.1.0
The same problem occurs on a system with SLURM 17.02.3 (at TACC). What might be the problem here?
With regards,
John