Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

MPI Bus Error

$
0
0

I'm developing a MPI application, which relies heavily on the MPI shared memory. Recently, I keep hitting the following error messages:

 

srun: error: compute-42-013: task 32: Bus error
srun: Terminating job step 324080.0
slurmstepd: error: *** STEP 324080.0 ON compute-42-012 CANCELLED AT 2020-06-14T04:17:51 ***
forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source
pVelodyne_intel_4  000000000C8E308E  Unknown               Unknown  Unknown
libpthread-2.17.s  00002B370FBBA5D0  Unknown               Unknown  Unknown
pVelodyne_intel_4  000000000397721B  PMPIDI_CH3I_Progr        1040  ch3_progress.c
pVelodyne_intel_4  00000000039FC370  MPIC_Wait                 269  helper_fns.c
pVelodyne_intel_4  00000000039FD83A  MPIC_Sendrecv             580  helper_fns.c
pVelodyne_intel_4  000000000392F61B  MPIR_Allgather_in         257  allgather.c
pVelodyne_intel_4  0000000003931752  MPIR_Allgather            858  allgather.c
pVelodyne_intel_4  0000000003931A77  MPIR_Allgather_im         905  allgather.c
pVelodyne_intel_4  0000000003933226  PMPI_Allgather           1068  allgather.c
pVelodyne_intel_4  000000000392CECE  Unknown               Unknown  Unknown

srun: error: compute-41-006: task 16: Bus error
srun: Terminating job step 324024.0
slurmstepd: error: *** STEP 324024.0 ON compute-41-006 CANCELLED AT 2020-06-13T16:54:13 ***
forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source
pVelodyne_intel_4  000000000C85058E  Unknown               Unknown  Unknown
libpthread-2.17.s  00002AEBC007A5D0  Unknown               Unknown  Unknown
pVelodyne_intel_4  00000000038E46DB  PMPIDI_CH3I_Progr        1040  ch3_progress.c
pVelodyne_intel_4  0000000003969830  MPIC_Wait                 269  helper_fns.c
pVelodyne_intel_4  000000000396ACFA  MPIC_Sendrecv             580  helper_fns.c
pVelodyne_intel_4  00000000038BA379  MPIR_Alltoall_int         438  alltoall.c
pVelodyne_intel_4  00000000038BBE3D  MPIR_Alltoall             734  alltoall.c
pVelodyne_intel_4  00000000038BC162  MPIR_Alltoall_imp         775  alltoall.c
pVelodyne_intel_4  00000000038BD875  PMPI_Alltoall             958  alltoall.c
 

It seems the bus error is inside the MPI subroutine. Since I do not have the source code  of intel MPI, I have no idea what went wrong. 

The intel mpi version I'm using is intel_parallel_studio/2018u4/compilers_and_libraries_2018.5.274. 

Any idea how to fix it? 

 

Thanks.

 

 


Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>