Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

Segfault in DAPL with Mellanox OFED 2.1

$
0
0

Hi,

We're having a problem with the Intel MPI library crashing since we've updated to the latest Mellanox OFED 2.1. For example, the test program supplied with Intel MPI (test/test.f90) crashes with a segfault. I compiled it using

mpif90 -debug all /apps/intel-mpi/4.1.1.036/test/test.f90 -o test.x

and managed to get a back trace from the crash using idbc:

#0  0x00007fcb9418f078 in ?? () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#1  0x00007fcb94190bf7 in ?? () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#2  0x00007fcb94191543 in MPID_nem_dapl_rc_init_20 () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#3  0x00007fcb941de883 in MPID_nem_dapl_init () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#4  0x00007fcb94276fc6 in ?? () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#5  0x00007fcb9427547c in MPID_nem_init_ckpt () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#6  0x00007fcb94276ca7 in MPID_nem_init () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#7  0x00007fcb94128070 in MPIDI_CH3_Init () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#8  0x00007fcb94265bad in MPID_Init () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#9  0x00007fcb9423c38f in MPIR_Init_thread () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#10 0x00007fcb94230258 in PMPI_Init () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#11 0x00007fcb946f331f in pmpi_init__ () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpigf.so.4
#12 0x0000000000403005 in main () at /apps/intel-mpi/4.1.1.036/test/test.f90:28
#13 0x0000000000402fbc in main ()

We are running CentOS 6.5.

Cheers,

Ben


Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>