Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

integration problem between Torque 4 and Intel(R) MPI Library for Linux* OS, Version 2019 Update 1

$
0
0

Hi!

I have successfully compiled and linked a program with IntelMPI and if I run it interactively or in background it runs very fast and without any problems on our new server (ProLiant DL580 Gen10, 1 node with 4 processors with 18 cores each, total 72 cores, hyperthreading disabled). If I try to submit it by Torque (version 4) strange things happen, for example:

1) if I submit 2 jobs asking each 8 cores they are both fine

2) if I submit a third job (8 cores) it is 4 times slower becasue the 8 process runs on two cores!

3) if I submit a fourth job it runs properly, but if I qdel all the four jobs, all of them disappear from qstat -a but the fourth is keeping running!

From previous discussion I notice in this forum, I have the feeling it is an integration problem between intelmpi and torque, so I did the following:

 export I_MPI_PIN=off
 export I_MPI_PIN_DOMAIN=socket

to run the program I did the following call of mpirun:

/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpirun -d -rmk pbs -bootstrap pbsdsh .................

I have checked and PBS_ENVIRONMENT is properly set to PBS_BATCH

Also torque configuration is apparently correct, the file

/var/lib/torque/server_priv/nodes contains the following line:

dscfbeta1.units.it np=72 num_node_boards=1

This is a severe problem for me, since the machine is shared so we do need a scheduler like torque (pbs) to run jobs compiled and linked to intelmpi. Any help suggestion is welcome!

thank you in advance

Mauro


Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>