Varying Intel MPI results using different topologies

Hello,

I am compiling and running a massive electronic structure program on an NSF supercomputer. I am compiling with the intel/15.0.2 Fortran compiler and impi/5.0.2, the latest-installed Intel MPI library.

The program has hybrid parallelization (MPI and OpenMP). When I run the program on a molecule using 4 MPI tasks on a single node (no OpenMP threading anywhere here), I obtain the correct result.

However, when I spread out the 4 tasks on 2 nodes (still 4 total tasks, just 2 on each node), I get what seem to be numerical-/precision-related errors.

Following Intel's Michael Steyer's slides on Intel MPI conditional reproducibility (http://goparallel.sourceforge.net/wp-content/uploads/2015/06/PUM21-3-Int...), I specified that all collective operations be run using topology-unaware algorithms by running mpiexec.hydra with the following flags:

-genv I_MPI_DEBUG 100
-genv I_MPI_ADJUST_ALLGATHER 1
-genv I_MPI_ADJUST_ALLGATHERV 1
-genv I_MPI_ADJUST_ALLREDUCE 2
-genv I_MPI_ADJUST_ALLTOALL 1
-genv I_MPI_ADJUST_ALLTOALLV 1
-genv I_MPI_ADJUST_ALLTOALLW 1
-genv I_MPI_ADJUST_BARRIER 1
-genv I_MPI_ADJUST_BCAST 1
-genv I_MPI_ADJUST_EXSCAN 1
-genv I_MPI_ADJUST_GATHER 1
-genv I_MPI_ADJUST_GATHERV 1
-genv I_MPI_ADJUST_REDUCE 1
-genv I_MPI_ADJUST_REDUCE_SCATTER 1
-genv I_MPI_ADJUST_SCAN 1
-genv I_MPI_ADJUST_SCATTER 1
-genv I_MPI_ADJUST_SCATTERV 1
-genv I_MPI_ADJUST_REDUCE_SEGMENT 1:14000
-genv I_MPI_STATS_SCOPE "topo"
-genv I_MPI_STATS "ipm"

This helps my job to proceed further than it did before; however, it still dies with what seems like numerical-/precision-related errors.

My question is: What other topology-aware settings are there, so that I can try to disable them and therefore obtain the correct results that I achieve when the MPI tasks run on only a single node? I have pored through the Intel MPI manual and haven't seen anything other than the above.

Please note that sometimes using multiple nodes works, e.g., if I use 2 MPI tasks total spread over two nodes. It really seems to me to be a strange topology issue. Another note, compiling+running with the latest versions of OpenMPI and MVAPICH2 both consistently die with seg faults, so using those libraries isn't really an option here. I obtain the same issues/results no matter what nodes have been allocated to me, and I have tested this many times.

Thank you very much in advance for your help!

Best,
Andrew

Varying Intel MPI results using different topologies

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List