Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

MPI_Allgatherv with large message sizes

$
0
0

Hi,

I'm trying to collect data with MPI_Allgatherv with a large receive buffer for which the total size is larger than 2GB. As I could understand here (http://software.intel.com/en-us/forums/topic/361060) this is not supported. Unfortunately when I try to use the -ilp64 option with mpiifort I run into several problems:

1) when using include 'mpif.h' to  include mpi, then after the following commands:

mpiifort -warn -O1 -g -traceback -check bounds -i8 -c gather.f

mpiifort -warn -O1 -g -traceback -check bounds -ilp64  gather.o -o gather.exe-ilp64

mpirun -ilp64 ./gather.exe-ilp64

I aborts with:

Assertion failed in file ../../i_rtc_cache.c at line 638: buf_end_palign > buf_start_palign
Assertion failed in file ../../i_rtc_cache.c at line 638: buf_end_palign > buf_start_palign

2) when including the mpi types through a "use mpi" statement, I can't compile the test program with '-i8' as it tells me the interface is incompatible. I guess this is because it doesn't know that i want to use the ilp64 interface. When compiling + linking in one go, it does work with only '-ilp64', but not if I add '-i8':

mpiifort -warn -O1 -g -traceback -check bounds -ilp64  gather.f -o gather.exe-ilp64

mpirun -ilp64 ./gather.exe-ilp64

after this, the program still crashes but now with the following error message:

Fatal error in PMPI_Allgatherv: Invalid count, error stack:
PMPI_Allgatherv(1430): MPI_Allgatherv(sbuf=0x2b33c8000010, scount=0, dtype=0x4c000829, rbuf=0x2b34b66b3010, rcounts=0x7fff2b9e7b70, displs=0x7fff2b9e7b60, dtype=0x4c000829, MPI_COMM_WORLD) failed
PMPI_Allgatherv(1375): Negative count, value is -1071939176
 BUFRECV =    5.55500000000000     
Fatal error in PMPI_Allgatherv: Invalid count, error stack:
PMPI_Allgatherv(1430): MPI_Allgatherv(sbuf=0x2b53e8000010, scount=0, dtype=0x4c000829, rbuf=0x2b54d66b3010, rcounts=0x7fff75f545f0, displs=0x7fff75f545e0, dtype=0x4c000829, MPI_COMM_WORLD) failed
PMPI_Allgatherv(1375): Negative count, value is -484441656

or with

Fatal error in PMPI_Allgatherv: Invalid count, error stack:
PMPI_Allgatherv(1430): MPI_Allgatherv(sbuf=0x2b8c98000010, scount=0, dtype=0x4c000829, rbuf=0x2b8d866b3010, rcounts=0x7fff83e96470, displs=0x7fff83e96460, dtype=0x4c000829, MPI_COMM_WORLD) failed
PMPI_Allgatherv(1375): Negative count, value is -1883799144
forrtl: error (69): process interrupted (SIGINT)
Image              PC                Routine            Line        Source             
libpthread.so.0    00002B5F11907251  Unknown               Unknown  Unknown
libdaploucm.so.2   00002B5F12F7869C  Unknown               Unknown  Unknown
libmpi_dbg.so.4    00002B5F10E8676F  Unknown               Unknown  Unknown
libmpi_dbg.so.4    00002B5F10E83718  dapl_rc_poll_recv         296  dapl_poll_rc.c
libmpi_dbg.so.4    00002B5F10E8330D  MPID_nem_dapl_rc_         124  dapl_poll_rc.c
libmpi_dbg.so.4    00002B5F10FC18C7  MPID_nem_network_          23  mpid_nem_network_poll.c
libmpi_dbg.so.4    00002B5F10DCD90E  MPIDI_CH3I_Progre         735  ch3_progress.c
libmpi_dbg.so.4    00002B5F10F2B592  MPIC_Wait                 568  helper_fns.c
libmpi_dbg.so.4    00002B5F10F290E9  MPIC_Sendrecv             206  helper_fns.c
libmpi_dbg.so.4    00002B5F10F2BA18  MPIC_Sendrecv_ft          717  helper_fns.c
libmpi_dbg.so.4    00002B5F10D7890E  MPIR_Allgatherv_i         770  allgatherv.c
libmpi_dbg.so.4    00002B5F10D7965F  MPIR_Allgatherv           955  allgatherv.c
libmpi_dbg.so.4    00002B5F10D799B0  MPIR_Allgatherv_i        1000  allgatherv.c
libmpi_dbg.so.4    00002B5F10D7C822  PMPI_Allgatherv          1400  allgatherv.c
libmpigf.so.4      00002B5F10AA4279  Unknown               Unknown  Unknown
libmpi_ilp64.so    00002B5F108709C3  Unknown               Unknown  Unknown
gather.exe-ilp64   0000000000403D1B  MAIN__                     56  gather.f
gather.exe-ilp64   0000000000402F1C  Unknown               Unknown  Unknown
libc.so.6          00002B5F11DBECDD  Unknown               Unknown  Unknown
gather.exe-ilp64   0000000000402E19  Unknown               Unknown  Unknown

So, that makes me wonder if I actually compiled it properly?

Test program is attached, mpiifort -show:

ifort -I/software/intel/impi/4.1.3.048/intel64/include -I/software/intel/impi/4.1.3.048/intel64/include -L/software/intel/impi/4.1.3.048/intel64/lib -Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker /software/intel/impi/4.1.3.048/intel64/lib -Xlinker -rpath -Xlinker /opt/intel/mpi-rt/4.1 -lmpigf -lmpi -lmpigi -ldl -lrt -lpthread

and ifort --version:

ifort.orig (IFORT) 13.1.3 20130607
Copyright (C) 1985-2013 Intel Corporation.  All rights reserved.

grtz

Steven

Fichier attachéTaille
Téléchargementgather.f1.8 Ko

Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>