Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

MPI_File_read_all MPI_File_write_all local size limit

$
0
0

Dear Intel support team,

I have problem with MPI_File_read_all MPI_File_rwrite_all subroutines. I have a fortran code that should read large binary file (~2TB). In this file are few 2D matrices. The largest matrix has size ~0.5TB. I read this file using MPI IO soubrutines something like this:

          call MPI_TYPE_CREATE_SUBARRAY(2,dim,loc_sizes,loc_starts,MPI_ORDER_FORTRAN,MPI_DOUBLE_PRECISION,my_subarray,ierr)
          call MPI_Type_commit(my_subarray,ierr)
          call MPI_File_set_view(filehandle, disp,MPI_DOUBLE_PRECISION,my_subarray, &
                         "native",MPI_INFO_NULL, ierr)

          call MPI_File_read_all(filehandle, float2d, loc_sizes(1)*loc_sizes(2),MPI_DOUBLE_PRECISION,status, ierr)

The problem occurs in MPI_File_read_all call. The number of elements in each submatrices loc_sizes(1)*loc_sizes(2) multiply by the matrix type (8 bytes in Double precision) can not be larger than Integer allowed number 2147483647 (~2GB). In my case each submatrices will have  more than 10-20 GB. I tried instead of using integer*4 to use integer*8 but it did not help as MPI subroutine I think transform it again to integer*4. Is there any solution of this problem as you did for example in  MPI_File_set_view where displacment type was changed from integer to INTEGER(KIND=MPI_OFFSET_KIND), INTENT(IN) :: disp. The program works fine if the submatrix size is smaller than 2147483647 bytes.

Here is the error message that I got:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
libifcore.so.5     00002ADA8C450876  for__signal_handl     Unknown  Unknown
libc-2.17.so       00002ADA928C8670  Unknown               Unknown  Unknown
libmpi.so.12.0     00002ADA91AAEB06  Unknown               Unknown  Unknown
libmpi.so.12.0     00002ADA91AAF780  Unknown               Unknown  Unknown
libmpi.so.12.0     00002ADA91AA3039  Unknown               Unknown  Unknown
libmpi.so.12.0     00002ADA91AA49E4  Unknown               Unknown  Unknown
libmpi.so.12.0     00002ADA91727370  Unknown               Unknown  Unknown
libmpi.so.12.0     00002ADA919A1C00  Unknown               Unknown  Unknown
libmpi.so.12.0     00002ADA91971B90  Unknown               Unknown  Unknown
libmpi.so.12       00002ADA9193EFF8  MPI_Isend             Unknown  Unknown
libmpi.so.12.0     00002ADA91695A61  Unknown               Unknown  Unknown
libmpi.so.12       00002ADA916943B8  ADIOI_GEN_ReadStr     Unknown  Unknown
libmpi.so.12       00002ADA91A6DDF5  PMPI_File_read_al     Unknown  Unknown
libmpifort.so.12.  00002ADA912AB4CB  mpi_file_read_all     Unknown  Unknown
jorek_model199     000000000044E747  vacuum_response_m         519  vacuum_response.f90
jorek_model199     000000000044B770  vacuum_response_m         986  vacuum_response.f90
jorek_model199     000000000044A6F4  vacuum_response_m          90  vacuum_response.f90
jorek_model199     000000000041134E  MAIN__                    486  jorek2_main.f90
jorek_model199     000000000040C95E  Unknown               Unknown  Unknown
libc-2.17.so       00002ADA928B4B15  __libc_start_main     Unknown  Unknown

 

Thank you in advance,

Mochalskyy Serhiy

 

Thread Topic: 

Bug Report

Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>