I am running performance benchmark with C++, Intel MPI and CentOS 7.
I use two types of benchmarks:
(1) Uni-band: https://software.intel.com/en-us/node/561907
The first half of ranks communicates with the second half using MPI_Isend/MPI_Recv/MPI_Wait calls. In case of the odd number of processes, one of them does not participate in the message exchange. The bunch of MPI_Isend calls are issued by each rank in the first half of ranks to its counterpart from the second half of ranks.
(2) Bi-band: https://software.intel.com/en-us/node/561908
The first half of ranks communicates with the second half using MPI_Isend/MPI_Recv/MPI_Wait calls. In case of the odd number of processes, one of them does not participate in the message exchange. The bunch of MPI_Isend calls are issued by each rank in the first half of ranks to its counterpart from the second half of ranks, and vice versa.
Since the ethernet on my machines supports full duplex transmission mode, I expected the results of bi-band would show almost two times higher maximum bandwidth than that of uni-band.
(Full-duplex operation doubles the theoretical bandwidth of the connection. If a link normally runs at 1 Mbps but can work in full-duplex mode, it really has 2 Mbps of bandwidth (1 Mbps in each direction).)
However, what I am observing is that even though the ethernet supports full duplex transmission mode, the bandwidth decreases half when there are communications in both directions (Rank-A <----> Rank-B) at the same time.
It seems Intel MPI does not support full duplex transmission mode. Can I resolve it?