Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

mpirun: unexpected disconnect completion event

$
0
0

 

Hi,

I've been running on 5 (distributed memory) nodes (each has 20 processors) by using mpirun -n 5 -ppn 1 -hosts nd1,nd2,nd3,nd4,nd5.

Sometimes it works, sometimes, it gives inaccurate results, and sometimes it crashes with the error:

"[0:nd1] unexpected disconnect completion event from [35:nd2] Fatal error in PMPI_Comm_dup: Internal MPI error!, error stack ...". 

Any suggestion to fix this communication error while running on multiple nodes with mpi (2017 update 2)?

I already set the stacksize to unlimited in my .rc. file. I tested this for two different applications (one is the famous distributed-memory solver, MUMPS). I have the same issue with both. This is not a very memory-demanding job. mpirun works perfectly on 1 node, this only happens on multiple nodes (even 2).

Thanks

 


Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>