Dear developers,
the round-robin placement forgets about the perhost parameter once it iterated over all hosts in the hostfile.
This was tested with Intel MPI 2019.1.
My hostfile looks like:
node551 node552
And when I start a small job, I get:
I_MPI_DEBUG=4 I_MPI_PIN_DOMAIN=core mpirun -f hostfile -n 8 -perhost 2 ./a.out [0] MPI startup(): libfabric version: 1.7.0a1-impi [0] MPI startup(): libfabric provider: verbs;ofi_rxm [0] MPI startup(): Rank Pid Node name Pin cpu [0] MPI startup(): 0 377136 node551 {0,40} [0] MPI startup(): 1 377137 node551 {1,41} [0] MPI startup(): 2 151304 node552 {0,40} [0] MPI startup(): 3 151305 node552 {1,41} [0] MPI startup(): 4 377138 node551 {2,42} [0] MPI startup(): 5 151306 node552 {2,42} [0] MPI startup(): 6 377139 node551 {3,43} [0] MPI startup(): 7 151307 node552 {3,43}
ranks 0-3 are distributed as expected, but ranks 4-7 are distributed across the hosts as if the perhost parameter is reset to 1.