MPI_Comm_rank , MPI_THREAD_MULTIPLE, and performance

Hi everyone,

We found the following behavior in Intel MPI (5.0.3) using both the intel compilers and gcc:

In an OpenMP-MPI environment, the performance of MPI_Comm_rank goes down if MPI is initialized using MPI_THREAD_MULTIPLE. I attach two files to show the behavior. They can be compiled with

mpiicpc main.cpp -o test.exe -openmp
mpiicpc main2.cpp -o test_nothreads.exe -openmp

Both executables do a simple parallelized for loop two times; the first time, an arithmetic operation is performed a lot of times. The second time, there are additional calls to MPI_Comm_rank within the loop.

test.exe uses MPI_THREAD_MULTIPLE. Here is a typical example of the runtime (with one thread, OMP_NUM_THREADS=1) for the two loops:

MPI_THREAD_MULTIPLE w/o rank: 0.0411851
MPI_THREAD_MULTIPLE w rank: 1.03309

test_no_threads.exe doesn't use MPI_THREAD_MULTIPLE, and we get:

w/o rank: 0.0452909
w rank: 0.181268

This slowdown gets a lot more severe if we do this with e.g. 16 OpenMP threads:

MPI_THREAD_MULTIPLE w rank: 6.07238
versus
w rank: 0.345186

Using a profiler we find that there is spin lock in MPI_Comm_rank that is responsible for the slowdown.

I see that with MPI_THREAD_MULTIPLE, there needs to be some locking for MPI operations. However, I do not see why this should be the case in MPI_Comm_rank, since I assume this to be a rather local operation - e.g. in OpenMPI, this is internally just returning a member of a struct, namely the process ID.
Therefore, I would like to understand if this is a known problem or a bug.

All the best, Christoph.

I can not attach cpp files for some reason, so here is just the code:

main2.cpp:

#include "mpi.h"
#include "omp.h"

#include <iostream>
#include "math.h"
#include "stdlib.h"

int main(int argc,char* args[])
{
	MPI_Init(NULL,NULL);
	long n=1000000;
	double start = MPI_Wtime();
	double *d = new double[n];
	double *d2 = new double[n];

#pragma omp parallel for
	for(long i=0;i<n;i++)
	{

		d2[i] = cos(d[i])*pow(d[i],3.0);


	}
	delete[] d;
	delete[] d2;


	double end1 = MPI_Wtime();
	std::cout << "w/o rank: "<< end1-start << std::endl;

	d = new double[n];
	d2 = new double[n];

#pragma omp parallel for
	for(long i=0;i<n;i++)
	{
		int myProcID;
		for(int j=0;j<10;j++)
	          MPI_Comm_rank(MPI_COMM_WORLD,&myProcID);
		d2[i] = cos(d[i])*pow(d[i],3.0);


	}

	double end2 = MPI_Wtime();
	std::cout << "w rank: "<< end2-end1 << std::endl;
	MPI_Finalize();

	return 0;

}

main.cpp:

#include "math.h"
#include "mpi.h"
#include "omp.h"

#include <iostream>
#include "stdlib.h"

int main(int argc,char* args[])
{
	int required = MPI_THREAD_MULTIPLE;
	  int provided = 0;
	    MPI_Init_thread(NULL,NULL,required,&provided);
	      if(provided!=required)
		        {
				      std::cout << "Error: MPI thread support insufficient! required "<< required << " provided "<< provided;
   			            abort();

		        }
	long n=1000000;
	double start = MPI_Wtime();
	double *d = new double[n];
	double *d2 = new double[n];

#pragma omp parallel for
	for(long i=0;i<n;i++)
	{

		d2[i] = cos(d[i])*pow(d[i],3.0);


	}
	delete[] d;
	delete[] d2;


	double end1 = MPI_Wtime();
	std::cout << "MPI_THREAD_MULTIPLE w/o rank: "<< end1-start << std::endl;

	d = new double[n];
	d2 = new double[n];

#pragma omp parallel for
	for(long i=0;i<n;i++)
	{
		int myProcID;
		for(int j=0;j<10;j++)
	          MPI_Comm_rank(MPI_COMM_WORLD,&myProcID);
		d2[i] = cos(d[i])*pow(d[i],3.0);


	}

	double end2 = MPI_Wtime();
	std::cout << "MPI_THREAD_MULTIPLE w rank: "<< end2-end1 << std::endl;
	MPI_Finalize();

	return 0;

}

MPI_Comm_rank , MPI_THREAD_MULTIPLE, and performance

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...