Hybrid OpenMP/MPI doesn't work with the Intel compiler

Greetings,

We provide the full Intel Cluster compiler suite of software on our cluster which uses Torque/Moab. Last week one of my users complained that his Hybrid OpenMP/MPI code wasn't running properly. The OpenMP portion was running great, but the MPI wasn't splitting the job up across nodes. So I dug into it a bit. Sure enough, the job launches on $X nodes but each node gets the full range of work and isn't split up.

To ensure that this was not a problem in his code, I slapped together a really basic hello world script with MPI and OpenMP. I confirmed the same behaviour. Not only that but it works just fine if I compile it with GCC instead! Hrm. Well, I haven't upgraded the Intel tool set in a few months and I know at least one update; maybe that is the problem. So I updated all of the toolsets that I have access to. _Everything_ is now up to date (as of yesterday). Try again and the exact same results.

$ mpif90 --version
GNU Fortran (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3)
$ mpiifort --version
ifort (IFORT) 14.0.0 20130728

Well maybe it is my code. I am more of a sysadmin then a programmer. I found this code snippet out in the wild and tried it: http://www.rcac.purdue.edu/userinfo/resources/common/compile/hybrid_hell...

Compile: mpiifort -openmp -mt_mpi hybrid_hello.f90
Run with the option of two hosts and two OpenMP threads [ `export OMP_NUM_THREADS=2` ] for testing.
Output of run:

SERIAL REGION:     Runhost:node03                           Rank:           0  of            1 ranks, Thread:           0  of            1  threads   hello, world
PARALLEL REGION:   Runhost:node03                           Rank:           0  of            1 ranks, Thread:           0  of            2  threads   hello, world
PARALLEL REGION:   Runhost:node03                           Rank:           0  of            1 ranks, Thread:           1  of            2  threads   hello, world
SERIAL REGION:     Runhost:node03                           Rank:           0  of            1 ranks, Thread:           0  of            1  threads   hello, world
SERIAL REGION:     Runhost:node01                           Rank:           0  of            1 ranks, Thread:           0  of            1  threads   hello, world
PARALLEL REGION:   Runhost:node01                           Rank:           0  of            1 ranks, Thread:           0  of            2  threads   hello, world
PARALLEL REGION:   Runhost:node01                           Rank:           0  of            1 ranks, Thread:           1  of            2  threads   hello, world
SERIAL REGION:     Runhost:node01                           Rank:           0  of            1 ranks, Thread:           0  of            1  threads   hello, world

I am given two hosts by Torque/Moab and I get two OpenMP threads, but there is only 1 rank! To quote Adam Savage "Well, there's the problem!" For whatever reason, each node seems to think that it is the only MPI thread. This is pretty much what I had been seeing, but it is much better code then mine so I feel better about showing its results. :-)

What happens with GCC?
Compile: mpif90 -lgomp -fopenmp hybrid_hello.f90
Run with the exact same script/submission process as before.
Output of run:

SERIAL REGION:     Runhost:node01                           Rank:           0  of            2 ranks, Thread:           0  of            1  threads   hello, world
PARALLEL REGION:   Runhost:node01                           Rank:           0  of            2 ranks, Thread:           0  of            2  threads   hello, world
PARALLEL REGION:   Runhost:node01                           Rank:           0  of            2 ranks, Thread:           1  of            2  threads   hello, world
SERIAL REGION:     Runhost:node01                           Rank:           0  of            2 ranks, Thread:           0  of            1  threads   hello, world
SERIAL REGION:     Runhost:node02                           Rank:           1  of            2 ranks, Thread:           0  of            1  threads   hello, world
PARALLEL REGION:   Runhost:node02                           Rank:           1  of            2 ranks, Thread:           0  of            2  threads   hello, world
PARALLEL REGION:   Runhost:node02                           Rank:           1  of            2 ranks, Thread:           1  of            2  threads   hello, world
SERIAL REGION:     Runhost:node02                           Rank:           1  of            2 ranks, Thread:           0  of            1  threads   hello, world

Well, look at that. It runs just fine and as expected with GCC but the Intel compiler isn't running the MPI ranking right at all. At this point, I am fairly certain it is an Intel compiler issue. Knowing that, I crashed the boards and info that Intel provides looking for answers. I found a lot but nothing really jumped out at me until I found this nifty hello world application: http://software.intel.com/en-us/articles/beginning-hybrid-mpiopenmp-deve...

Now for retesting using the Intel provided code. Surely this will run right. After all, the guide uses the same compile options I have been using!

Compile: mpiifort -openmp -mt_mpi hybrid-hello.f90
Run with the option of two hosts and two OpenMP threads [ `export OMP_NUM_THREADS=2` ] for testing.
Output of run:

Hello from thread   0 of   2 in rank   0 of   1 on node01
Hello from thread   1 of   2 in rank   0 of   1 on node01
Hello from thread   0 of   2 in rank   0 of   1 on node03
Hello from thread   1 of   2 in rank   0 of   1 on node03

Not a great start. Same output I have been getting. What happens with GCC?

Compile: mpif90 -lgomp -fopenmp hybrid-hello.f90
Run with the exact same script/submission process as before.
Output of run:

Hello from thread   1 of   2 in rank   0 of   1 on node02
Hello from thread   0 of   2 in rank   0 of   1 on node02
Hello from thread   0 of   2 in rank   1 of   2 on node03
Hello from thread   1 of   2 in rank   1 of   2 on node03

It works with GCC! What? I am now zero for three on the Hybrid OpenMP/MPI problem (Well, zero for four if you count the user who brought this to my attention). What else could it be? I wonder if it doesn't like the mpirun that I got with Torque/Moab. I do have access to (and have already installed) the Intel MPI toolsets. Well instead of using the Torque mpirun, I will try the Intel mpirun!

And....no. Not only does it not have the pernode parameter but it seems to be missing a few other features as well...I finally get it to run with `mpirun -bynode -np 2 a.out` because Torque is allocating to the job 2 cores on 2 hosts and I want it to *only* launch one MPI job per host. Anyway, it finally runs...with the exact same output as before (though I am still not convinced that I have this limited version of mpirun from Intel configured with the right options yet and I can't use mpiexec because the Intel mpiexec doesn't appear to recognize the Torque/Moab directives).

So the question is, what am I doing wrong? I can't seem to get the Intel compiled version of this code to run in a proper Hybrid OpenMP/MPI configuration. It obviously is working for GCC so I fairly convinced it isn't the code. The cluster is running great with other Intel MPI jobs using the Torque provided mpirun, so I don't think it is that (though it could be I might get better perofmance if I can get the Intel mpirun tweaked and working properly). That really only leaves the Intel compiler left and I am stumped on why it isn't working.

Any help would be greatly appreciated.

Thank you!

Hybrid OpenMP/MPI doesn't work with the Intel compiler

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112