Hi,
The attached program simple_repro.c reproduces what I believe is a bug in the Intel MPI implementation version 4.1.
In short, what it does is it spawns <num_threads> threads on 2 processes, such that thread i on rank 0 is supposed to communicate with thread i on rank 1 using their private communicator. The only difference between the 2 processes involved is that the threads on rank 0 are coordinated with a semaphore, such that they can't all be active at the same time. Threads on rank 1 run freely.
The problem is that if the communication between a pair of threads involves creating a child communicator via MPI_Comm_dup(), it is very likely that they will run into a deadlock situation, where <sem_value> pairs of threads are stuck in (comm_dup, comm_dup) and <num_threads> - <sem_value> pairs of threads are stuck in (sem_wait, comm_dup). See attached stack traces. This sounds to me like a starvation problem.
$ mpigcc -mt_mpi -O3 -Dnum_threads=4 -Dnum_reps=10 -Dsem_value=1 simple_repro.c -o simple_repro $ mpirun -n 2 `pwd`/simple_repro [1] MPI startup(): shm data transfer mode [0] MPI startup(): shm data transfer mode [0] MPI startup(): Rank Pid Node name Pin cpu [0] MPI startup(): 0 24808 localhost {0,1,4,5} [0] MPI startup(): 1 24809 localhost {2,3,6,7} [0] MPI startup(): I_MPI_DEBUG=5 [0] MPI startup(): I_MPI_PIN_MAPPING=2:0 0,1 2 ...HANGS...
The exact same program never hangs with impi 5.0, not even with high values for <num_threads> and <num_reps>.
Can anybody confirm this is a library issue that has been fixed in version 5.0?
Thank you!