Dear Intel support team,
I have problem with MPI_File_read_all MPI_File_rwrite_all subroutines. I have a fortran code that should read large binary file (~2TB). In this file are few 2D matrices. The largest matrix has size ~0.5TB. I read this file using MPI IO soubrutines something like this:
call MPI_TYPE_CREATE_SUBARRAY(2,dim,loc_sizes,loc_starts,MPI_ORDER_FORTRAN,MPI_DOUBLE_PRECISION,my_subarray,ierr)
call MPI_Type_commit(my_subarray,ierr)
call MPI_File_set_view(filehandle, disp,MPI_DOUBLE_PRECISION,my_subarray, &
"native",MPI_INFO_NULL, ierr)
call MPI_File_read_all(filehandle, float2d, loc_sizes(1)*loc_sizes(2),MPI_DOUBLE_PRECISION,status, ierr)
The problem occurs in MPI_File_read_all call. The number of elements in each submatrices loc_sizes(1)*loc_sizes(2) multiply by the matrix type (8 bytes in Double precision) can not be larger than Integer allowed number 2147483647 (~2GB). In my case each submatrices will have more than 10-20 GB. I tried instead of using integer*4 to use integer*8 but it did not help as MPI subroutine I think transform it again to integer*4. Is there any solution of this problem as you did for example in MPI_File_set_view where displacment type was changed from integer to INTEGER(KIND=MPI_OFFSET_KIND), INTENT(IN) :: disp. The program works fine if the submatrix size is smaller than 2147483647 bytes.
Here is the error message that I got:
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libifcore.so.5 00002ADA8C450876 for__signal_handl Unknown Unknown
libc-2.17.so 00002ADA928C8670 Unknown Unknown Unknown
libmpi.so.12.0 00002ADA91AAEB06 Unknown Unknown Unknown
libmpi.so.12.0 00002ADA91AAF780 Unknown Unknown Unknown
libmpi.so.12.0 00002ADA91AA3039 Unknown Unknown Unknown
libmpi.so.12.0 00002ADA91AA49E4 Unknown Unknown Unknown
libmpi.so.12.0 00002ADA91727370 Unknown Unknown Unknown
libmpi.so.12.0 00002ADA919A1C00 Unknown Unknown Unknown
libmpi.so.12.0 00002ADA91971B90 Unknown Unknown Unknown
libmpi.so.12 00002ADA9193EFF8 MPI_Isend Unknown Unknown
libmpi.so.12.0 00002ADA91695A61 Unknown Unknown Unknown
libmpi.so.12 00002ADA916943B8 ADIOI_GEN_ReadStr Unknown Unknown
libmpi.so.12 00002ADA91A6DDF5 PMPI_File_read_al Unknown Unknown
libmpifort.so.12. 00002ADA912AB4CB mpi_file_read_all Unknown Unknown
jorek_model199 000000000044E747 vacuum_response_m 519 vacuum_response.f90
jorek_model199 000000000044B770 vacuum_response_m 986 vacuum_response.f90
jorek_model199 000000000044A6F4 vacuum_response_m 90 vacuum_response.f90
jorek_model199 000000000041134E MAIN__ 486 jorek2_main.f90
jorek_model199 000000000040C95E Unknown Unknown Unknown
libc-2.17.so 00002ADA928B4B15 __libc_start_main Unknown Unknown
Thank you in advance,
Mochalskyy Serhiy