Dear all,
I have some questions related to MPI Isend/Recv bottleneck.
Below is my subroutine named "Broadcast_boundary":
DO NP=1,NPROCS-1 IF(MYRANK==NP-1)THEN CALL MPI_ISEND( ARRAY_1D(L/2-NUMBER),NUMBER,MPI_REAL,NP ,101,MPI_COMM_WORLD,IREQ1,IERR) CALL MPI_IRECV( ARRAY_1D(L/2) ,NUMBER,MPI_REAL,NP ,102,MPI_COMM_WORLD,IREQ2,IERR) CALL MPI_WAIT(IREQ1,STATUS1,IERR) CALL MPI_WAIT(IREQ2,STATUS2,IERR) ELSEIF(MYRANK==NP)THEN CALL MPI_ISEND( ARRAY_1D(L/2) ,NUMBER,MPI_REAL,NP-1,102,MPI_COMM_WORLD,IREQ1,IERR) CALL MPI_IRECV( ARRAY_1D(L/2-NUMBER),NUMBER,MPI_REAL,NP-1,101,MPI_COMM_WORLD,IREQ2,IERR) CALL MPI_WAIT(IREQ1,STATUS1,IERR) CALL MPI_WAIT(IREQ2,STATUS2,IERR) ENDIF ENDDO
The code is designed to communicate the boundary data between np-1 and np from 0 to nprocs.
And here is my sample program:
L=20000; NUM=500 ALLOCATE(A(L),B(L)) CALL RANDOM(A) CALL RANDOM(B) CALL MPI_BARRIER(MPI_COMM_WORLD,IERR) TIC=MPI_WTIME() CALL BROADCAST_BOUNDARY(A,NUM) TOC=MPI_WTIME() MPI_WTIMES(1)=TOC-TIC CALL MPI_BARRIER(MPI_COMM_WORLD,IERR) TIC=MPI_WTIME() CALL BROADCAST_BOUNDARY(B,NUM) TOC=MPI_WTIME() MPI_WTIMES(2)=TOC-TIC
As far as I am concerned, the mpi_wtimes(1) (the elapsed time to communicate an array A) and mpi_wtimes(2) (array B) will be nearly same because the size of A and B are equal.
But after several experiments, I found that mpi_wtimes(1) took about eight times more time than mpi_wtimes(2).
Please let me know if there's an initial process in MPI communication, or if there's something on the first communication that accelerated the next performance, or if my code needs to be improved.
I'll attach the entire code I tested. If you can test it and let me know the results, I think many questions will be answered.