Hello,
I am experiencing issues while using MPI_Sendrecv on multiple machines. In the code I am sending a vector in the circular manner in parallel. Each process is sending data to the subsequent process and receiving data from preceding process. Surprisingly, in the first execution of SEND_DATA routine the output is correct. While for the second execution the output is incorrect. The code and the output are below.
PROGRAM SENDRECV_REPROD USE MPI USE ISO_FORTRAN_ENV,ONLY: INT32 IMPLICIT NONE INTEGER(KIND=INT32) :: STATUS(MPI_STATUS_SIZE) INTEGER(KIND=INT32) :: RANK,NUM_PROCS,IERR CALL MPI_INIT(IERR) CALL MPI_COMM_RANK(MPI_COMM_WORLD,RANK,IERR) CALL MPI_COMM_SIZE(MPI_COMM_WORLD,NUM_PROCS,IERR) CALL SEND_DATA(RANK,NUM_PROCS) CALL SEND_DATA(RANK,NUM_PROCS) CALL MPI_BARRIER(MPI_COMM_WORLD,IERR) CALL MPI_FINALIZE(IERR) END PROGRAM SUBROUTINE SEND_DATA(RANK,NUM_PROCS) USE ISO_FORTRAN_ENV,ONLY: INT32,REAL64 USE MPI IMPLICIT NONE INTEGER(KIND=INT32),INTENT(IN) :: RANK INTEGER(KIND=INT32),INTENT(IN) :: NUM_PROCS INTEGER(KIND=INT32) :: IERR,ALLOC_ERROR INTEGER(KIND=INT32) :: VEC_SIZE,I_RANK,RANK_DESTIN,RANK_SOURCE,TAG_SEND,TAG_RECV REAL(KIND=REAL64), ALLOCATABLE :: COMM_BUFFER(:),VEC1(:) INTEGER(KIND=INT32) :: MPI_COMM_STATUS(MPI_STATUS_SIZE) ! Allocate communication arrays. VEC_SIZE = 374454 ALLOCATE(COMM_BUFFER(VEC_SIZE),STAT=ALLOC_ERROR) ALLOCATE(VEC1(VEC_SIZE),STAT=ALLOC_ERROR) ! Define destination and source ranks for sending and receiving messages. RANK_DESTIN = MOD((RANK+1),NUM_PROCS) RANK_SOURCE = MOD((RANK+NUM_PROCS-1),NUM_PROCS) TAG_SEND = RANK+1 TAG_RECV = RANK IF (RANK==0) TAG_RECV=NUM_PROCS VEC1=RANK COMM_BUFFER=0.0_REAL64 CALL MPI_BARRIER(MPI_COMM_WORLD,IERR) DO I_RANK=1,NUM_PROCS IF (RANK==I_RANK-1) WRITE(*,*) 'R',RANK, VEC1(1),'B', COMM_BUFFER(1) ENDDO CALL MPI_SENDRECV(VEC1(1),VEC_SIZE,MPI_DOUBLE_PRECISION,RANK_DESTIN,TAG_SEND,COMM_BUFFER(1),& VEC_SIZE,MPI_DOUBLE_PRECISION,RANK_SOURCE,TAG_RECV,MPI_COMM_WORLD,MPI_COMM_STATUS,IERR) DO I_RANK=1,NUM_PROCS IF (RANK==I_RANK-1) WRITE(*,*) 'R' , RANK , VEC1(1),'A', COMM_BUFFER(1) ENDDO END SUBROUTINE SEND_DATA
Output of four processes run on four machines:
R 0 0.000000000000000E+000 B 0.000000000000000E+000
R 1 1.00000000000000 B 0.000000000000000E+000
R 2 2.00000000000000 B 0.000000000000000E+000
R 3 3.00000000000000 B 0.000000000000000E+000
R 0 0.000000000000000E+000 A 3.00000000000000
R 1 1.00000000000000 A 0.000000000000000E+000
R 2 2.00000000000000 A 1.00000000000000
R 3 3.00000000000000 A 2.00000000000000
R 0 0.000000000000000E+000 B 0.000000000000000E+000
R 1 1.00000000000000 B 0.000000000000000E+000
R 2 2.00000000000000 B 0.000000000000000E+000
R 3 3.00000000000000 B 0.000000000000000E+000
R 0 0.000000000000000E+000 A 2.00000000000000
R 1 1.00000000000000 A 3.00000000000000
R 2 2.00000000000000 A 0.000000000000000E+000
R 3 3.00000000000000 A 1.00000000000000
As you see the output of first SEND_DATA execution is different from the second. The results are correct if I run the reproducer on single machine with multiple processes. I am compiling the code with: mpiifort for the Intel(R) MPI Library 2017 Update 3 for Linux* ifort version 17.0.4
and running with mpirun version Intel(R) MPI Library for Linux* OS, Version 2017 Update 3 Build 20170405.
Do you have any idea what could be a source of this issue?
Thank you,
Piotr