Hi,
Below is a simple reproduction case for the issue we're facing:
#include "stdio.h"
#include "mpi.h"
#include "stdlib.h"
int main(int argc, char* argv[]) {
int rank;
MPI_Group group;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_group(MPI_COMM_WORLD, &group);
if (rank == 0) {
printf("rank 0: about to send\n");
MPI_Ssend(NULL, 0, MPI_INT, 1, 0, MPI_COMM_WORLD);
printf("rank 0: send completed\n");
} else {
MPI_Request req[2];
int which;
MPI_Isend(NULL, 0, MPI_INT, 0, 0, MPI_COMM_WORLD, &req[0]);
MPI_Irecv(NULL, 0, MPI_INT, 0, 0, MPI_COMM_WORLD, &req[1]);
MPI_Waitany(2, req, &which, MPI_STATUS_IGNORE);
if (which == 0) {
printf("rank 1: send succeeded; cancelling receive request\n");
MPI_Cancel(&req[1]);
MPI_Wait(&req[1], MPI_STATUS_IGNORE);
} else {
printf("rank 1: receive succeeded; cancelling send request\n");
MPI_Cancel(&req[0]);
MPI_Wait(&req[0], MPI_STATUS_IGNORE);
}
}
MPI_Finalize();
return 0;
}
This program outputs the following, after which it hangs indefinitely:
rank 0: about to send rank 1: send succeeded; cancelling receive request
I understand that this is caused by the "eager completion" of MPI_Isend() on rank 1. Also, I understand that the expected behaviour of a program that initiates an unmatched operation is undefined. However, I don't believe this is the case here, as I do eventually call MPI_Cancel() on the request. If that was not enough, then wouldn't that imply that a program that simply does MPI_Isend(...); MPI_Cancel(...); MPI_Wait(...); is also incorrect?
I also noticed that changing the MPI_Isend() into MPI_Issend() makes the program work as expected:
rank 0: about to send rank 0: send completed rank 1: receive succeeded; cancelling send request
So, to keep it short, my questions are:
- Is the initial (MPI_Isend()) version of my program an incorrect MPI program, whose behaviour is undefined?
- If so, then could you please explain why and point me to the relevant section of the MPI standard or any other resources that would clarify these matters for me?
- Is the MPI_Issend() version of my program also incorrect?
- If MPI_Issend() still doesn't make the program correct, can I at least be sure that, with the Intel implementation, it will always work as expected? Or is it just a coincidence that it does?
Many thanks to anyone willing to help me with this!
- Adrian