Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

Bug in mpi_f08-module with IMPI-5.1.1.109&Intel-16.0.0 on Linux-cluster

$
0
0

Dear developers of IMPI

 

I observed a bug when using the mpi_f08-module with IMPI-5.1.1.109 & intel-16.0.0

when running a Fortran program with 2 MPI-procs on a LINUX-cluster.

Data are not correctly transmitted by MPI_GATHER and by MPI_BCAST.

 

 

a) MPI_GATHER: The following bug occurred only with mpi_f08-module, whereas with mpi-module it worked

 

A simplified code snippet looks like that:

 

integer , parameter :: mxhostlen=128

character(len=mxhostlen) :: HOST_NAME

character(len=mxhostlen), allocatable, dimension(:) :: nodename_from_irankWORLD

 

 

! Note: numprocsWORLD is the number of MPI-procs running

if(lmaster) then ! <-- rank 0

allocate( nodename_from_irankWORLD(0:numprocsWORLD-1) )

else

allocate( nodename_from_irankWORLD(0:0) ) ! for saving storage on the slave procs

endif

call MPI_GATHER( HOST_NAME , mxhostlen, MPI_CHARACTER &

,nodename_from_irankWORLD, mxhostlen, MPI_CHARACTER &

,0_INT4, MPI_COMM_WORLD, ierr_mpi )

 

Using this for gathering the hostnames from each process on the master process, I get:

 

[0] Fatal error in PMPI_Gather: Message truncated, error stack:

[0] PMPI_Gather(1303).......: MPI_Gather(sbuf=0xa8e160, scount=128, MPI_CHARACTER, rbuf=0x26402d0, rcount=1, MPI_CHARACTER, root=0, MPI_COMM_WORLD) failed

[0] MPIR_Gather_impl(728)...:

[0] MPIR_Gather(682)........:

[0] I_MPIR_Gather_intra(822):

[0] MPIR_Gather_intra(187)..:

[0] MPIR_Localcopy(125).....: Message truncated; 128 bytes received but buffer size is 1

 

You see, that the value of rcount= is 1 but should be 128 (=mxhostlen).

 

If I change the call of MPI_GATHER into this stmt:

call MPI_GATHER( HOST_NAME , mxhostlen, MPI_CHARACTER &

,nodename_from_irankWORLD(0), mxhostlen, MPI_CHARACTER &

,0_INT4, MPI_COMM_WORLD, ierr_mpi )

then it works, but nevertheless this is also a bug, because there must not be any influence,

whether the starting address of the receiving choice buffer

is actually the starting address of the array or the address of its 1st array element

or the address of a (sufficiently long) variable)

 

 

 

b) MPI_BCAST: the following bug occurred only with mpi_f08-module, whereas with mpi-module it worked

 

A simplified code snippet looks like that:

 

integer , parameter :: mxpathlen=512

character(len=mxpathlen), save :: CONF_DIR

character(len=mxpathlen), dimension(1) :: cbuffarr

integer :: nelem, lenelem, lentot, ierr_mpi

 

nelem=1

lenelem=mxpathlen

lentot= nelem * lenelem ! total number of characters to be transmitted

!!! call MPI_BCAST( CONF_DIR , lentot, MPI_CHARACTER, 0, MPI_COMM_WORLD, ierr_mpi ) ! <--did work

cbuffarr(1)= CONF_DIR

call MPI_BCAST( cbuffarr , lentot, MPI_CHARACTER, 0, MPI_COMM_WORLD, ierr_mpi ) ! <--did not work

!!! call MPI_BCAST( cbuffarr(1), lentot, MPI_CHARACTER, 0, MPI_COMM_WORLD, ierr_mpi ) ! <--did work

CONF_DIR=cbuffarr(1)

 

Using this to transmit a string from the master to all slaves,

I get not an error message, but the string sent is not received on the slaves!

 

 

Possibly these bugs are also in the interfaces of other MPI-routines of the mpi_f08 module?

 

Greetings

Michael


Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>