Hi all,
I am trying to call MPI from within OpenMP regions, but I cannot have it working properly; my program compiles OK using mpiicc (4.1.1.036) and icc (13.1.2 20130514). I checked that it was linked against thread-safe libraries (libmpi_mt.so appears when I run ldd).
But when I try to run it (2 Ivybridge nodes x 2 MPI tasks x 12 OpenMP threads), I get a SIGSEGV without any backtrace :
/opt/softs/intel/impi/4.1.1.036/intel64/bin/mpirun -np 4 -ppn 2 ./mpitest.x
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
Or with debug level set to 5 :
/opt/softs/intel/impi/4.1.1.036/intel64/bin/mpirun -np 4 -ppn 2 ./mpitest.x
[1] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1
[0] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1
[1] MPI startup(): DAPL provider ofa-v2-mlx4_0-1
[0] MPI startup(): DAPL provider ofa-v2-mlx4_0-1
[1] MPI startup(): shm and dapl data transfer modes
[0] MPI startup(): shm and dapl data transfer modes
[2] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1
[3] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1
[2] MPI startup(): DAPL provider ofa-v2-mlx4_0-1
[2] MPI startup(): shm and dapl data transfer modes
[3] MPI startup(): DAPL provider ofa-v2-mlx4_0-1
[3] MPI startup(): shm and dapl data transfer modes
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 90871 beaufix522 {0,1,2,3,4,5,6,7,8,9,10,11,24,25,26,27,28,29,30,31,32,33,34,35}
[0] MPI startup(): 1 90872 beaufix522 {12,13,14,15,16,17,18,19,20,21,22,23,36,37,38,39,40,41,42,43,44,45,46,47}
[0] MPI startup(): 2 37690 beaufix523 {0,1,2,3,4,5,6,7,8,9,10,11,24,25,26,27,28,29,30,31,32,33,34,35}
[0] MPI startup(): 3 37691 beaufix523 {12,13,14,15,16,17,18,19,20,21,22,23,36,37,38,39,40,41,42,43,44,45,46,47}
[0] MPI startup(): I_MPI_DEBUG=5
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_DIST=10,15,15,10
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_MAP=mlx4_0:0
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_NUM=2
[0] MPI startup(): I_MPI_PIN_MAPPING=2:0 0,1 12
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
Of course, if I use a single OpenMP thread, everything works fine. I also tried to wrap calls to MPI into critical regions, which works, but is not what I want.
My program is just a small test case to figure out whether I can try this pattern inside a bigger program. For each MPI task, all OpenMP threads are used to send messages to other tasks, and afterwards, all OpenMP threads are used to receive messages from other tasks.
My questions are :
- does my program conforms to the thread level MPI_THREAD_MULTIPLE (which btw is returned by MPI_Init_thread) ?
- is IntelMPI supposed to run it correctly ?
- if not, will it work someday ?
- what can I do now (extra tests, etc...) ?
Best regards,
Philippe