I've a coarray program which I compile for distributed memory execution.
I then run it on a single 16-core node with different numbers of processors.
It runs fine with 2, 4 and 8 processes, but give the following error with 16 processes.
Can I get any clue from the error message?
Thanks
Anton
===> co_back1.x
-genvall -genv I_MPI_FABRICS=shm:dapl -machinefile ./nodes -n 2 ./co_back1.x
188.85user 29.04system 1:55.30elapsed 188%CPU (0avgtext+0avgdata 66640maxresident)k
1624inputs+945432outputs (2major+13770minor)pagefaults 0swaps
===> co_back1.x
-genvall -genv I_MPI_FABRICS=shm:dapl -machinefile ./nodes -n 4 ./co_back1.x
263.14user 94.69system 2:51.91elapsed 208%CPU (0avgtext+0avgdata 71376maxresident)k
0inputs+2791464outputs (0major+22881minor)pagefaults 0swaps
===> co_back1.x
-genvall -genv I_MPI_FABRICS=shm:dapl -machinefile ./nodes -n 8 ./co_back1.x
420.93user 292.96system 2:41.95elapsed 440%CPU (0avgtext+0avgdata 88192maxresident)k
0inputs+8998288outputs (0major+48387minor)pagefaults 0swaps
===> co_back1.x
-genvall -genv I_MPI_FABRICS=shm:dapl -machinefile ./nodes -n 16 ./co_back1.x
application called MPI_Abort(comm=0x84000000, 3) - process 0
[1:node43-038] unexpected disconnect completion event from [0:node43-038]
[1:node43-038][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_poll_rc.c:2482] Intel MPI fatal error
: OpenIB-cma DTO operation posted for [0:node43-038] completed with error. status=0x1. cookie=0x150008000
0
Assertion failed in file ../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_poll_rc.c at line 2485: 0
internal ABORT - process 1