Hi All,
I'm HPC Admin. I have installed MPI4PY Library on clusters by .tar and Pip2.7(python2.7). After that, we are facing the issue like a 256cores job (n=4,ppn=64) is not running on nodes. It happened after installing MPI4PY(3.0). normal python code is running.
Users unable to run jobs on Cluster like VASP,MPI4PY, mpi, openmpi, etc.
The error is Given below:
[cli_0]: aborting job:
Fatal error in MPI_Init:
Other MPI error
[mpiexec@tyrone-node16] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:184): assert (!closed) failed
[mpiexec@tyrone-node16] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:74): unable to send SIGUSR1 downstream
[mpiexec@tyrone-node16] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@tyrone-node16] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
[mpiexec@tyrone-node16] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completions
Kindly Help me.
Thanks in Advance!
Rahul Akolkar