Hi All,
As per the recent webinar introducing new Intel MPI 2019 update 5 features, it is now in theory possible to include Intel MPI libaries, and call mpirun for a multi-node MPI job entirely inside a Singularity container, with no need to have Intel MPI installed outside the container. So instead of launching an MPI job in a container using an external MPI stack, like so:
mpirun -n <nprocs> -perhost <procs_per_node> -hosts <hostlist> singularity exec <container_name> <path_to_executable_inside_container>
one should now be able to do:
singularity exec <container_name> mpirun -n <nprocs> -perhost <procs_per_node> -hosts <hostlist> <path_to_executable_inside_container>
I have the Intel MPI 2019.5 libraries (as well as Intel run-time libraries for C++), plus libfabric, inside my container, along with sourcing the following in the container:
cat /.singularity.d/env/90-environment.sh #!/bin/sh # Custom environment shell code should follow source /opt/intel/bin/compilervars.sh intel64 source /opt/intel/impi/2019.5.281/intel64/bin/mpivars.sh -ofi_internal=1 release
This is not working so far. Below I illustrate with a simple test, and run from inside the container (shell mode), and get the following error messages after about 20-30 seconds of the command just hanging with no output:
Singularity image.sif:~/singularity/fv3-upp-apps> export I_MPI_DEBUG=500 Singularity image.sif:~/singularity/fv3-upp-apps> export FI_PROVIDER=verbs Singularity image.sif:~/singularity/fv3-upp-apps> export FI_VERBS_IFACE="ib0" Singularity image.sif:~/singularity/fv3-upp-apps> export I_MPI_FABRICS=shm:ofi Singularity image.sif:~/singularity/fv3-upp-apps> mpirun -n 78 -perhost 20 -hosts appro07,appro08,appro09,appro10 hostname [mpiexec@appro07.internal.redlineperf.com] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:114): unable to run proxy on appro07 (pid 109898) [mpiexec@appro07.internal.redlineperf.com] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:152): check exit codes error [mpiexec@appro07.internal.redlineperf.com] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:205): poll for event error [mpiexec@appro07.internal.redlineperf.com] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:731): error waiting for event [mpiexec@appro07.internal.redlineperf.com] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:1919): error setting up the boostrap proxies
I also tried just calling mpirun using just one host (and only enough processes that fit on one host), with the same result.
Is there a specific list of dependencies (e.g. do I need openssh-clients installed?) to use this all-inside-the-container approach? I do not see anything within the Intel MPI 2019 upsate 5 Developer Reference about running with Singularity containers.
Thanks, Keith