Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

-perhost parameter forgotten after first iteration over all hosts

$
0
0

Dear developers,

the round-robin placement forgets about the perhost parameter once it iterated over all hosts in the hostfile.
This was tested with Intel MPI 2019.1.

My hostfile looks like:

node551
node552

And when I start a small job, I get:

I_MPI_DEBUG=4 I_MPI_PIN_DOMAIN=core mpirun -f hostfile -n 8 -perhost 2  ./a.out
[0] MPI startup(): libfabric version: 1.7.0a1-impi
[0] MPI startup(): libfabric provider: verbs;ofi_rxm
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       377136   node551   {0,40}
[0] MPI startup(): 1       377137   node551   {1,41}
[0] MPI startup(): 2       151304   node552   {0,40}
[0] MPI startup(): 3       151305   node552   {1,41}
[0] MPI startup(): 4       377138   node551   {2,42}
[0] MPI startup(): 5       151306   node552   {2,42}
[0] MPI startup(): 6       377139   node551   {3,43}
[0] MPI startup(): 7       151307   node552   {3,43}

ranks 0-3 are distributed as expected, but ranks 4-7 are distributed across the hosts as if the perhost parameter is reset to 1.


Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>