Hi!
I have successfully compiled and linked a program with IntelMPI and if I run it interactively or in background it runs very fast and without any problems on our new server (ProLiant DL580 Gen10, 1 node with 4 processors with 18 cores each, total 72 cores, hyperthreading disabled). If I try to submit it by Torque (version 4) strange things happen, for example:
1) if I submit 2 jobs asking each 8 cores they are both fine
2) if I submit a third job (8 cores) it is 4 times slower becasue the 8 process runs on two cores!
3) if I submit a fourth job it runs properly, but if I qdel all the four jobs, all of them disappear from qstat -a but the fourth is keeping running!
From previous discussion I notice in this forum, I have the feeling it is an integration problem between intelmpi and torque, so I did the following:
export I_MPI_PIN=off
export I_MPI_PIN_DOMAIN=socket
to run the program I did the following call of mpirun:
/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpirun -d -rmk pbs -bootstrap pbsdsh .................
I have checked and PBS_ENVIRONMENT is properly set to PBS_BATCH
Also torque configuration is apparently correct, the file
/var/lib/torque/server_priv/nodes contains the following line:
dscfbeta1.units.it np=72 num_node_boards=1
This is a severe problem for me, since the machine is shared so we do need a scheduler like torque (pbs) to run jobs compiled and linked to intelmpi. Any help suggestion is welcome!
thank you in advance
Mauro