Hi
I've got MPI installed v4.1.0.024 and if I do run a test program "Hellow world" on one node with 12 cpu it does work, but if I run the program on two nodes (24cpu) it does not and I am getting message
[mpiexec@red0044] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:221): assert (!closed) failed
[mpiexec@red0044] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:128): unable to send SIGUSR1 downstream
[mpiexec@red0044] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@red0044] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:388): error waiting for event
[mpiexec@red0044] main (./ui/mpich/mpiexec.c:745): process manager error waiting for completion
My qsub script is below:
module load intel/mpi/4.1.0.024
nprocs=`wc -l $PBS_NODEFILE | awk '{ print $1 }'`
echo $PBS_NODEFILE
mpirun -n $nprocs ./test > output_file
---------------------------------------
The module I am loading.
setenv MPIROOT /local/software/rh53/intel/mpi/4.1.0
prepend-path PATH /local/software/rh53/intel/mpi/4.1.0/bin64
prepend-path LD_LIBRARY_PATH /local/software/rh53/intel/mpi/4.1.0/lib64
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Could you please point me if I do something wrong.
Regards,
Max