Hi,
we have compiled our parallel code by using the latest Intel's software stack. We do use a lot of passive RMA one-sided PUT/GET operations along with a derived datatypes. Now we are expericincing problem that sometimes our application fails with either segmentation fault or with the following error message:
[6] Assertion failed in file ../../segment.c at line 669: cur_elmp->curcount >= 0
[6] internal ABORT - process 6
The Intel's inspector shows a problem inside the Intel MPI library:
libmpi_dbg.so.4!MPID_Segment_blkidx_m2m - segment_packunpack.c:313
libmpi_dbg.so.4!MPID_Segment_manipulate - segment.c:552
libmpi_dbg.so.4!MPID_Segment_unpack - segment_packunpack.c:88
libmpi_dbg.so.4!MPIDI_CH3U_Receive_data_found - ch3u_handle_recv_pkt.c:190
libmpi_dbg.so.4!MPIDI_CH3_PktHandler_GetResp - ch3u_rma_sync.c:3691
libmpi_dbg.so.4!MPID_nem_handle_pkt - ch3_progress.c:1477
libmpi_dbg.so.4!MPIDI_CH3I_Progress - ch3_progress.c:498
libmpi_dbg.so.4!MPIDI_Win_unlock - ch3u_rma_sync.c:1959
libmpi_dbg.so.4!PMPI_Win_unlock - win_unlock.c:119
Does it mean that the something is wrong with the derived datatypes? If yes, how I can debug the problem? The problem never appears within OpenMPI.
The SW stack used:
Intel C/Fortran compilers v15.0.0.090
Intel MPI Library v5.0.1.035
Any help will be greatly appreciated!
Best,
Victor.