The structure of my code is,
//part1
if(i>1){
Compute1;
}
//part2
if(i<m)
{
Compute2;
MPI_Allgatherv(); //Replaced by MPI_Iallgatherv();
}
//part3
if(i>0)
{
Compute3;
MPI_Allreduce();
}
part4
if(i<m){
Compute4;
}
Collective operations in part 2 is the bottleneck of this program.
I replaced "MPI_Allgatherv()" by the NBC "MPI_Iallgatherv()" in order to hide the collective communication by part3 and part4. But part3 and part4 take much longer than before. What do you think is the cause of this problem?
Thanks!