Dear developers of IMPI
I observed a bug when using the mpi_f08-module with IMPI-5.1.1.109 & intel-16.0.0
when running a Fortran program with 2 MPI-procs on a LINUX-cluster.
Data are not correctly transmitted by MPI_GATHER and by MPI_BCAST.
a) MPI_GATHER: The following bug occurred only with mpi_f08-module, whereas with mpi-module it worked
A simplified code snippet looks like that:
integer , parameter :: mxhostlen=128
character(len=mxhostlen) :: HOST_NAME
character(len=mxhostlen), allocatable, dimension(:) :: nodename_from_irankWORLD
! Note: numprocsWORLD is the number of MPI-procs running
if(lmaster) then ! <-- rank 0
allocate( nodename_from_irankWORLD(0:numprocsWORLD-1) )
else
allocate( nodename_from_irankWORLD(0:0) ) ! for saving storage on the slave procs
endif
call MPI_GATHER( HOST_NAME , mxhostlen, MPI_CHARACTER &
,nodename_from_irankWORLD, mxhostlen, MPI_CHARACTER &
,0_INT4, MPI_COMM_WORLD, ierr_mpi )
Using this for gathering the hostnames from each process on the master process, I get:
[0] Fatal error in PMPI_Gather: Message truncated, error stack:
[0] PMPI_Gather(1303).......: MPI_Gather(sbuf=0xa8e160, scount=128, MPI_CHARACTER, rbuf=0x26402d0, rcount=1, MPI_CHARACTER, root=0, MPI_COMM_WORLD) failed
[0] MPIR_Gather_impl(728)...:
[0] MPIR_Gather(682)........:
[0] I_MPIR_Gather_intra(822):
[0] MPIR_Gather_intra(187)..:
[0] MPIR_Localcopy(125).....: Message truncated; 128 bytes received but buffer size is 1
You see, that the value of rcount= is 1 but should be 128 (=mxhostlen).
If I change the call of MPI_GATHER into this stmt:
call MPI_GATHER( HOST_NAME , mxhostlen, MPI_CHARACTER &
,nodename_from_irankWORLD(0), mxhostlen, MPI_CHARACTER &
,0_INT4, MPI_COMM_WORLD, ierr_mpi )
then it works, but nevertheless this is also a bug, because there must not be any influence,
whether the starting address of the receiving choice buffer
is actually the starting address of the array or the address of its 1st array element
or the address of a (sufficiently long) variable)
b) MPI_BCAST: the following bug occurred only with mpi_f08-module, whereas with mpi-module it worked
A simplified code snippet looks like that:
integer , parameter :: mxpathlen=512
character(len=mxpathlen), save :: CONF_DIR
character(len=mxpathlen), dimension(1) :: cbuffarr
integer :: nelem, lenelem, lentot, ierr_mpi
nelem=1
lenelem=mxpathlen
lentot= nelem * lenelem ! total number of characters to be transmitted
!!! call MPI_BCAST( CONF_DIR , lentot, MPI_CHARACTER, 0, MPI_COMM_WORLD, ierr_mpi ) ! <--did work
cbuffarr(1)= CONF_DIR
call MPI_BCAST( cbuffarr , lentot, MPI_CHARACTER, 0, MPI_COMM_WORLD, ierr_mpi ) ! <--did not work
!!! call MPI_BCAST( cbuffarr(1), lentot, MPI_CHARACTER, 0, MPI_COMM_WORLD, ierr_mpi ) ! <--did work
CONF_DIR=cbuffarr(1)
Using this to transmit a string from the master to all slaves,
I get not an error message, but the string sent is not received on the slaves!
Possibly these bugs are also in the interfaces of other MPI-routines of the mpi_f08 module?
Greetings
Michael