hello:
I get into trouble when use MPI_Recv in my programmes.
My programme start 3 subprocess,and bind them to cpu 1-3 respectively. In each subprocess, first disabled interrupts , then send message to other process and receive from others. Repeat it a billion times.
I except that MPI_Recv will return in a fixed times ,and not use MPI_irecv instead.
In order to do that, i disabled interrupts and cancel ticks on cpu1-3,remove other process from cpu 1-3 to cpu 0,and bind interrupts to cpu0.
But I found that a very few times(about a billion times may happen 1 times) the MPI_Recv will block for more than 600 ms, but normally the MPI_Recv only take less than 10ms.
I don't know why the MPI_Recv some times block so long,is there any method to find the reason and solve the problem?
Use mpirun -n 3 to execute the programmes, use Hydra and Shared memory fabrics.
environment : parallel_studio_xe_2015_update2 linux 3.10
===== Processor composition =====
Processor name : Intel(R) Core(TM) i5-4590
Packages(sockets) : 1
Cores : 4
Processors(CPUs) : 4
Cores per package : 4
Threads per core : 1
void emt_comm() { ...... for (i=0; i<ProcInfo.NumProc; i++) { if (i != ProcInfo.Id) MPI_Send(SendBuf, EMT_COMM_NUM, MPI_DOUBLE_PRECISION, i, i+1, ProcInfo.CommEMTCAL); } for (i=0; i<ProcInfo.NumProc; i++) { if (i != ProcInfo.Id) MPI_Recv(buf22, EMT_COMM_NUM, MPI_DOUBLE_PRECISION, i, ProcInfo.Id+1, ProcInfo.CommEMTCAL, &MpiStt); } } void *thread_emt(__attribute__((unused)) void *arg) { ...... set_thread_affinity(core_id); MPI_Barrier(ProcInfo.CommCAL); disabled_inter(); for(step=1; step<=10000000; step++) { emt_comm(); MPI_Barrier(ProcInfo.CommCAL); } open_inter(); } int main(int argc,char *argv[]) { ...... isCalculationOver = 0; set_thread_affinity(0); MPI_Init(&argc, &argv); MPI_Comm_rank( MPI_COMM_WORLD, &ProcInfo.Id); MPI_Comm_size( MPI_COMM_WORLD, &ProcInfo.NumProc); core = ProcInfo.Id+1; MPI_Barrier(MPI_COMM_WORLD); ...... pthread_create(&thread, NULL, thread_emt, &core); ...... while(1 != isCalculationOver) usleep(100*1000); MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); return 0; }