Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

Help with MPI abort message

$
0
0

Hi all,

I'm pretty new to MPI usage and debugging.

I'm running the WRF model (Weather and Research Forecasting) and after some successful outputs (i.e., the model runs as expected) I get the following message from several nodes which causes the simulation to abort :  

[n18:mpi_rank_89][handle_cqe] Send desc error in msg to 101, wc_opcode=0
[n18:mpi_rank_89][handle_cqe] Msg from 101: wc.status=12, wc.wr_id=0xba31140, wc.opcode=0, vbuf->phead->type=0 = MPIDI_CH3_PKT_EAGER_SEND
[n18:mpi_rank_89][handle_cqe] src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:587: [] Got completion with error 12, vendor code=0x81, dest rank=101
: Numerical result out of range (34)

Can anyone please share some information about this message ? How to dig deep into this error ? NOTE : I'm running the model with optimization -O3 ; running it with no optimization (-O0) the simulation doesn't abort at that point in simulation (though the simulation is extremely slow and was stopped by me). I'm using Intel Fortran compiler version  Version 14.0.2.144.

Any suggestions how to tackle this issue are extremely appreciated, thank you all in advance,

Jack.

   

 


Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>