Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

Trying to use I_MPI_PIN_DOMAIN=socket

$
0
0

I'm running on an IBM cluster with nodes that have dual socket Ivy Bridge processors and 2 Nvidia K40 Tesla cards.  I'm trying to run with 4 MPI ranks using Intel MPI 5 Update 2 with a single MPI rank for each socket.  I'm trying to learn how to do this by using a simple MPI Hello World program that prints out the host name, rank and cpu ID.  When I run with 2 MPI ranks, my simple program works as expected.  When I run with 4 MPI ranks and use the mpirun that comes with Intel MPI, all 4 ranks run on the same node that I launched from.  I am doing this interactively and get a set of two nodes using the following command:

qsub -I -l nodes=2,ppn=16 -q k20

I am using the following commands to run my program:

source /opt/intel/bin/compilervars.sh intel64; \
source /opt/intel/impi_latest/intel64/bin/mpivars.sh; \
export I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-1u; \
/opt/intel/impi_latest/intel64/bin/mpirun -genv I_MPI_PIN=1 -genv I_MPI_PIN_DOMAIN=socket -n 4 hw_ibm_impi

If I use a different qsub command, i.e. qsub -I -l nodes=2,ppn=2 -q k20, the program runs as expected with 2 ranks on each node.  But that does not seem the right way to get my node allocation if I want to also run threads from each MPI rank.  Also, using my initial qsub command, I can run with 32 ranks and 16 ranks per host and the application runs as expected.

I can also try using the Intel mpiexec command instead of mpirun and I get the following result:

source /opt/intel/bin/compilervars.sh intel64; \
source /opt/intel/impi_latest/intel64/bin/mpivars.sh; \
export I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-1u; \
/opt/intel/impi_latest/intel64/bin/mpiexec -genv I_MPI_PIN=1 -genv I_MPI_PIN_DOMAIN=socket -n 4 hw_ibm_impi
mpiexec_ibm-011: cannot connect to local mpd (/tmp/mpd2.console_username); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)

Any ideas why this is not working?  Am I not using I_MPI_PIN_DOMAIN correctly?  Could there be something messed up with the Intel MPI installation on the cluster?  Or some problem with the installation of the scheduler?

Thanks,

Dave

 


Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>