Hi All,
I am writing to ask some questions related to CFD model result different by mpi_hosts order.
I would like to hear your opinion theoretically becuase the code is long and complex and it would be difficult to reproduce it through the sample codes.
The current situation is that when nodes in different infiniband switches perform parallel computations, case #1 works well, but case #2 doesn't work well.
"Doesn't work well" means that there is a difference in values.
Background: host01-host04 in IB switch#1 and host99 in IB switch#2.
Case#1: host01, host02, host03, host04, host99(i.e. header node is hosts01)
Case#2: host99, host01, host02, host03, host04(i.e. header node is hosts99)
As far as I can guess(It's a hypothetical scenario with no theoretical basis),
1) There are miss communication problems while the header node is on another switch.
2) Myranks are reversed while working on MPI_COMM_RANK several times.
3) There are some problems(broken or mismatch) in MPI_COMM_WORLD.
First of all, for debugging, I'm putting the print statement in several places to see which subroutine or function changes the value.
(I'll post more when the situation is updated.)
However, no matter what function I finally find, I am not sure it's a part of code-level resolution, so I post to the forum to hear a story about a similar experiences.
Thank you.