Our cluster has 2 Haswell sockets per node, each with 12 cores (24 cores/node).
Using: intel/15.1.133, impi/5.0.3.048
Irrespective of which of the options mentioned in the subject line are used, ranks are always being placed in round-robin fashion. The commands are being run in batch job that generates a host file that contains lines like the following when submitted with:
qsub -l nodes=2:ppn=1 ...
tfe02.% cat hostfile
t0728
t0731
tfe02.%
As an aside, looks like "-ordered-output" is also being ignored. I understand that is a little difficult to achieve, but just wanted to use that for better readability. So please note that the ranks are not printed out in order.
With "-perhost 2" I was expecting ranks 0 on 1 to be on the same node:
-------------
cat /var/spool/torque/aux//889322.bqs5
s0014
s0015
mpirun -ordered-output -np 4 -perhost 2 ./hello_mpi_c-intel-impi
Hello from rank 01 out of 4; procname = s0015, cpuid = 12
Hello from rank 03 out of 4; procname = s0015, cpuid = 24
Hello from rank 02 out of 4; procname = s0014, cpuid = 0
Hello from rank 00 out of 4; procname = s0014, cpuid = 12
---------
The help output from mpirun indicates "-perhost" and "-ppn" are equivalent:
----------
cat /var/spool/torque/aux//889321.bqs5
s0014
s0015
mpirun -ordered-output -np 4 -ppn 2 ./hello_mpi_c-intel-impi
Hello from rank 00 out of 4; procname = s0014, cpuid = 12
Hello from rank 02 out of 4; procname = s0014, cpuid = 0
Hello from rank 01 out of 4; procname = s0015, cpuid = 12
Hello from rank 03 out of 4; procname = s0015, cpuid = 24
--------
Again, "-grr" output is not what was expected:
----------------
cat /var/spool/torque/aux//889323.bqs5
s0014
s0015
mpirun -ordered-output -np 4 -grr 2 ./hello_mpi_c-intel-impi
Hello from rank 02 out of 4; procname = s0014, cpuid = 2
Hello from rank 00 out of 4; procname = s0014, cpuid = 12
Hello from rank 03 out of 4; procname = s0015, cpuid = 24
Hello from rank 01 out of 4; procname = s0015, cpuid = 12
I'm including code that has not been cleaned up below :-(
Please ignore parts that are note relevant.
#include <stdio.h> #include <mpi.h> #define _GNU_SOURCE /* See feature_test_macros(7) */ #include <sched.h> int main(int argc, char **argv) { int ierr, myid, npes; int len, i; char name[MPI_MAX_PROCESSOR_NAME]; ierr = MPI_Init(&argc, &argv); #ifdef MACROTEST #define MACROTEST 10 #endif ierr = MPI_Comm_rank(MPI_COMM_WORLD, &myid); ierr = MPI_Comm_size(MPI_COMM_WORLD, &npes); ierr = MPI_Get_processor_name( name, &len ); #ifdef SLEEP for (i=1; i<1e1150; i++) ; #endif printf("Hello from rank %2.2d out of %d; procname = %s, cpuid = %d\n", myid, npes, name, sched_getcpu()); #ifdef MACROTEST printf("Test Macro: %d\n", MACROTEST); #endif #ifdef BUG { int* x = (int*)malloc(10 * sizeof(int)); x[10] = 0; // problem 1: heap block overrun printf("Print something %d\n",x[10]); } // problem 2: memory leak -- x not freed #endif ierr = MPI_Finalize(); }