Hi,
I'm trying to collect data with MPI_Allgatherv with a large receive buffer for which the total size is larger than 2GB. As I could understand here (http://software.intel.com/en-us/forums/topic/361060) this is not supported. Unfortunately when I try to use the -ilp64 option with mpiifort I run into several problems:
1) when using include 'mpif.h' to include mpi, then after the following commands:
mpiifort -warn -O1 -g -traceback -check bounds -i8 -c gather.f
mpiifort -warn -O1 -g -traceback -check bounds -ilp64 gather.o -o gather.exe-ilp64
mpirun -ilp64 ./gather.exe-ilp64
I aborts with:
Assertion failed in file ../../i_rtc_cache.c at line 638: buf_end_palign > buf_start_palign
Assertion failed in file ../../i_rtc_cache.c at line 638: buf_end_palign > buf_start_palign
2) when including the mpi types through a "use mpi" statement, I can't compile the test program with '-i8' as it tells me the interface is incompatible. I guess this is because it doesn't know that i want to use the ilp64 interface. When compiling + linking in one go, it does work with only '-ilp64', but not if I add '-i8':
mpiifort -warn -O1 -g -traceback -check bounds -ilp64 gather.f -o gather.exe-ilp64
mpirun -ilp64 ./gather.exe-ilp64
after this, the program still crashes but now with the following error message:
Fatal error in PMPI_Allgatherv: Invalid count, error stack:
PMPI_Allgatherv(1430): MPI_Allgatherv(sbuf=0x2b33c8000010, scount=0, dtype=0x4c000829, rbuf=0x2b34b66b3010, rcounts=0x7fff2b9e7b70, displs=0x7fff2b9e7b60, dtype=0x4c000829, MPI_COMM_WORLD) failed
PMPI_Allgatherv(1375): Negative count, value is -1071939176
BUFRECV = 5.55500000000000
Fatal error in PMPI_Allgatherv: Invalid count, error stack:
PMPI_Allgatherv(1430): MPI_Allgatherv(sbuf=0x2b53e8000010, scount=0, dtype=0x4c000829, rbuf=0x2b54d66b3010, rcounts=0x7fff75f545f0, displs=0x7fff75f545e0, dtype=0x4c000829, MPI_COMM_WORLD) failed
PMPI_Allgatherv(1375): Negative count, value is -484441656
or with
Fatal error in PMPI_Allgatherv: Invalid count, error stack:
PMPI_Allgatherv(1430): MPI_Allgatherv(sbuf=0x2b8c98000010, scount=0, dtype=0x4c000829, rbuf=0x2b8d866b3010, rcounts=0x7fff83e96470, displs=0x7fff83e96460, dtype=0x4c000829, MPI_COMM_WORLD) failed
PMPI_Allgatherv(1375): Negative count, value is -1883799144
forrtl: error (69): process interrupted (SIGINT)
Image PC Routine Line Source
libpthread.so.0 00002B5F11907251 Unknown Unknown Unknown
libdaploucm.so.2 00002B5F12F7869C Unknown Unknown Unknown
libmpi_dbg.so.4 00002B5F10E8676F Unknown Unknown Unknown
libmpi_dbg.so.4 00002B5F10E83718 dapl_rc_poll_recv 296 dapl_poll_rc.c
libmpi_dbg.so.4 00002B5F10E8330D MPID_nem_dapl_rc_ 124 dapl_poll_rc.c
libmpi_dbg.so.4 00002B5F10FC18C7 MPID_nem_network_ 23 mpid_nem_network_poll.c
libmpi_dbg.so.4 00002B5F10DCD90E MPIDI_CH3I_Progre 735 ch3_progress.c
libmpi_dbg.so.4 00002B5F10F2B592 MPIC_Wait 568 helper_fns.c
libmpi_dbg.so.4 00002B5F10F290E9 MPIC_Sendrecv 206 helper_fns.c
libmpi_dbg.so.4 00002B5F10F2BA18 MPIC_Sendrecv_ft 717 helper_fns.c
libmpi_dbg.so.4 00002B5F10D7890E MPIR_Allgatherv_i 770 allgatherv.c
libmpi_dbg.so.4 00002B5F10D7965F MPIR_Allgatherv 955 allgatherv.c
libmpi_dbg.so.4 00002B5F10D799B0 MPIR_Allgatherv_i 1000 allgatherv.c
libmpi_dbg.so.4 00002B5F10D7C822 PMPI_Allgatherv 1400 allgatherv.c
libmpigf.so.4 00002B5F10AA4279 Unknown Unknown Unknown
libmpi_ilp64.so 00002B5F108709C3 Unknown Unknown Unknown
gather.exe-ilp64 0000000000403D1B MAIN__ 56 gather.f
gather.exe-ilp64 0000000000402F1C Unknown Unknown Unknown
libc.so.6 00002B5F11DBECDD Unknown Unknown Unknown
gather.exe-ilp64 0000000000402E19 Unknown Unknown Unknown
So, that makes me wonder if I actually compiled it properly?
Test program is attached, mpiifort -show:
ifort -I/software/intel/impi/4.1.3.048/intel64/include -I/software/intel/impi/4.1.3.048/intel64/include -L/software/intel/impi/4.1.3.048/intel64/lib -Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker /software/intel/impi/4.1.3.048/intel64/lib -Xlinker -rpath -Xlinker /opt/intel/mpi-rt/4.1 -lmpigf -lmpi -lmpigi -ldl -lrt -lpthread
and ifort --version:
ifort.orig (IFORT) 13.1.3 20130607
Copyright (C) 1985-2013 Intel Corporation. All rights reserved.
grtz
Steven