Hi everyone,
I'm using MPICH2 v1.5 to run my WRF model on INTEL Xeon Processors. I can run on one node with as many cores as I want but if it exceeds the number of porcessors in a core it will crash with following error:
*********************************************************************************************************************************************
[proxy:0:0@hpc1934] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:883): assert (!closed) failed
[proxy:0:0@hpc1934] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0@hpc1934] main (./pm/pmiserv/pmip.c:210): demux engine error waiting for event
[proxy:0:7@hpc1945] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:883): assert (!closed) failed
[proxy:0:7@hpc1945] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:7@hpc1945] main (./pm/pmiserv/pmip.c:210): demux engine error waiting for event
[proxy:0:5@hpc1940] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:883): assert (!closed) failed
[proxy:0:5@hpc1940] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:5@hpc1940] main (./pm/pmiserv/pmip.c:210): demux engine error waiting for event
[mpiexec@hpc1934] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@hpc1934] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@hpc1934] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for completion
[mpiexec@hpc1934] main (./ui/mpich/mpiexec.c:325): process manager error waiting for completion
**********************************************************************************************************************************************
and this is how I run the model:
$ulimit -s unlimited
$source ~/setup-intel.sh
$mpiexec -np nproc ./wrf.exe >& benchmark#n.log
I would appreciate any help in this regard.
Bests,
Arash