Hi,
Trying to make use of the MPI persistent communication primitives in our application, I'm ending up with the following sequence of events:
MPI_Ssend_init(msg, msg_length, MPI_BYTE, 0, tag, comm, &req); MPI_Start(&req); MPI_Cancel(&req); MPI_Wait(&req, MPI_STATUS_IGNORE); MPI_Request_free(&req); // <-- HANGS
The only other node is blocked in an MPI_Barrier(comm);
I noticed that if I comment out the MPI_Barrier() call and let the other node proceed to freeing the communicator and then enter some other MPI_Barrier() on a different communicator, then the MPI_Request_free() call magically returns.
I tried reproducing this in a separate test program, but everything works as expected there. So I understand that there is probably some (possibly unrelated) bug in my original application that causes this behaviour and that one would need more information in order to figure this out.
But what puzzles me is that MPI_Request_free() blocks, even though the standard says that it is supposed to be a local operation (i.e. its completion should not depend on any other nodes).
So my main questions are:
- What can MPI_Request_free() possibly be waiting for?
- Any ideas/suggestions for how I can best debug this sort of issues?
Thanks in advance!
- Adrian