efficent methood for hybrid OpenMP and MPI

Hi, I have a question about an efficient method for extending well implemented scientific code written using C++ and OpenMP, to MPI layer.

This code is architecture-aware implementation (ccNuma, affinity, caches, etc..) that can utilize different aspects of architectures, and especially used all threads.

The main goal is to implement MPI layer without performance losses on the exits shared memory code, and do it efficiently.

So, I have to overlap MPI communications with OpenMP computations. My application allows for achieving this goal since I perform a loop-blocking technique.

Shortly speaking: When the results from the one block can be send to another MPI rank, the OpenMP threads can perform computations – such schema is repeated several time, and after it the synchronization point is necessary. Then, such a structure is run thousand times.

The main requirement/limitations of MPI communication will be a lot of small portions of data for exchanging (a lot of data bars of size 1.5 KB or 3 KB from 3D arrays)

This code will be run on rather novel hardware and software :

Intel CPU cluster
Intel MIC cluster: MPI communication between KNC (and KNL similar to 1.)
Hybrid: MPI communication between CPUs and MICs

The general question how to do it in an efferent way: I do not ask about implementation details but which MPI scenarios can guaranties the best performance.

In details:

Does the MPI communication cause any cores overheads – I men when I run both MPI communications and OMP computations at the same time but on different memory region
Should I allocated MPI communication for a separate (dedicated for this task) core, when other cores will perform OMP computations, which scenarios will be more efficiently:
1. OMP master or a single threads blinded to a single physical core run communication only, other OMP threads use others cores for computation
  - which communication will be better here synchronous or asynchronous ??
2. a selected group of OMP threads for MPI communication and computations while others OMP threads for computations only
3. or other solutions ??

In fact, the 2.b is most suitable for my application, but the programmer is responsible to guaranties the right MPI communication paths between MPI ranks and OMP threads.

If any can help me or share with me his advance experience I will be very happy.

Lukasz

Zone:

Thread Topic:

How-To

efficent methood for hybrid OpenMP and MPI

Zone:

Thread Topic:

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112