MPI program hangs in "MPI

Hi All,

I will explain the current situation and the attached file.

The MPI application performed with LSF is currently debugging due to a problem that does not terminate the operation. Currently, the code level suspects mpi_finalize, and it occurs randomly, not every time, so we need to check more about the occurrence conditions. I inquired about similar symptoms in MPI forum, but the result was not known as post went to the ticket in the middle.
Please check if it is a similar symptom.
https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technolog...

- Strace results of MPI executing hosts (I suspected this error "│ + 01:17:43 read(7 ")

----------------------------------------------------------------------------------------
duru0403 has 24 procs as below:

* Name/State       : pmi_proxy / State:   S (sleeping)[m
PID/PPID         : 141955 / 141954
Commandline      : **************/apps/intel/18.4/impi/2018.4.274/intel64/bin/pmi_proxy --control-port duru0374:37775 --pmi-connect alltoall --pmi-aggregate -s 0 --rmk lsf --launcher lsf --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1000390395 --usize -2 --proxy-id -1
CPU/MEMs_allowed : 0-95 / 0-3
[<ffffffff96e56e55>] poll_schedule_timeout+0x55/0xb0
[<ffffffff96e585dd>] do_sys_poll+0x48d/0x590
[<ffffffff96e587e4>] SyS_poll+0x74/0x110
[<ffffffff97374ddb>] system_call_fastpath+0x22/0x27
[<ffffffffffffffff>] 0xffffffffffffffff
Files            :
     Num of pipes: 26
     Num of sockets: 16
     Num of anon_inodes: 0
Strace           :
     + /xshared/support/systrace/strace: Process 141955 attached
     + 01:17:43 restart_syscall(<... resuming interrupted poll ...>/xshared/support/systrace/strace: Process 141955 detached
     + <detached ...>
Num of subprocs : 23
│
├─Name/State       : ensda / State:   S (sleeping)[m
│ PID/PPID         : 141959 / 141955
│ Commandline      : **************
│ CPU/MEMs_allowed : 0 / 0-3
│ [<ffffffff972f5139>] unix_stream_read_generic+0x309/0x8e0
│ [<ffffffff972f5804>] unix_stream_recvmsg+0x54/0x70
│ [<ffffffff972186ec>] sock_aio_read.part.9+0x14c/0x170
│ [<ffffffff97218731>] sock_aio_read+0x21/0x30
│ [<ffffffff96e404d3>] do_sync_read+0x93/0xe0
│ [<ffffffff96e40fb5>] vfs_read+0x145/0x170
│ [<ffffffff96e41dcf>] SyS_read+0x7f/0xf0
│ [<ffffffff97374ddb>] system_call_fastpath+0x22/0x27
│ [<ffffffffffffffff>] 0xffffffffffffffff
│ Files            :
│    - > /dev/infiniband/uverbs0
│    - > **************/log_proc00324.log
│    -   /dev/infiniband/uverbs0
│    Num of pipes: 6
│    Num of sockets: 5
│    Num of anon_inodes: 6
│ Strace           :
│    + /xshared/support/systrace/strace: Process 141959 attached
│    + 01:17:43 read(7, /xshared/support/systrace/strace: Process 141959 detached
│    + <detached ...>
│ Num of subprocs : 0
----------------------------------------------------------------------------------------

- Version Infomaition
   Intel Compiler: 18.5.234
   Intel MPI: 18.4.234
   DAPL: ofa-v2-mlx5_0-1u

- MPI options I used

declare -x I_MPI_DAPL_UD="1"
declare -x I_MPI_FABRICS="dapl"
declare -x I_MPI_HYDRA_BOOTSTRAP="lsf"
declare -x I_MPI_PIN="1"
declare -x I_MPI_PIN_PROCESSOR_LIST="0-5,24-29"
declare -x I_MPI_ROOT="**************/apps/intel/18.4/compilers_and_libraries/linux/mpi"

- And the code I used

After MPI_FINALIZED, there are 5 lines of codes that are if, close, and deallocate command.
Can these cause the hang problem?

! last part of main_program

call fin_common_par

(there is nothing)

endprogram


!!!!!!!!!!!!!!!!!

subroutine fin_common_par
implicit none
integer :: ierr

call mpi_finalize(ierr)
call fin_log

if(allocated(ranks_per_node)) deallocate(ranks_per_node)
if(allocated(stride_ranks))         deallocate(stride_ranks)

return
end subroutine fin_common_par

!!!!!!!!!!!!!!!!!

subroutine fin_log
implicit none

if(logf_unit == closed_unit) return
close(logf_funit)
logf_unit = closed_unit

return
endsubroutine fin_log

!!!!!!!!!!!!!!!!!

Additionaly, How can I get call stack of process like this post
https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technolog...

Thank you in advance.

MPI program hangs in "MPI_Finalize"

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List