Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

MPI_Recv block a long time

$
0
0

hello:
    I get into trouble when use MPI_Recv in my programmes. 
    My programme start 3 subprocess,and bind them to cpu 1-3 respectively. In each subprocess, first disabled interrupts , then send message to other process and receive from others. Repeat it a billion times.
    I except that MPI_Recv will return in a fixed times ,and not use MPI_irecv instead.
    In order to do that, i disabled interrupts and cancel ticks on cpu1-3,remove other process from cpu 1-3 to cpu 0,and bind interrupts to cpu0.
    But I found that a very few times(about a billion times may happen 1 times) the MPI_Recv will block for more than 600 ms, but normally the MPI_Recv only take less than 10ms.
    I don't know why the  MPI_Recv some times block so long,is there any method to find the reason and solve the problem?

   Use mpirun -n 3 to execute the programmes, use Hydra and Shared memory fabrics.
   environment :    parallel_studio_xe_2015_update2  linux 3.10 
=====  Processor composition  =====
Processor name    : Intel(R) Core(TM) i5-4590  
Packages(sockets) : 1
Cores             : 4
Processors(CPUs)  : 4
Cores per package : 4
Threads per core  : 1

void emt_comm()
{
    ......
    for (i=0; i<ProcInfo.NumProc; i++)
    {
        if (i != ProcInfo.Id)
            MPI_Send(SendBuf, EMT_COMM_NUM, MPI_DOUBLE_PRECISION, i, i+1, ProcInfo.CommEMTCAL);
    }

    for (i=0; i<ProcInfo.NumProc; i++)
    {
        if (i != ProcInfo.Id)
            MPI_Recv(buf22, EMT_COMM_NUM, MPI_DOUBLE_PRECISION, i, ProcInfo.Id+1, ProcInfo.CommEMTCAL, &MpiStt);
    }
}




void *thread_emt(__attribute__((unused)) void *arg)
{
    ......
    set_thread_affinity(core_id);
    MPI_Barrier(ProcInfo.CommCAL);
    disabled_inter();
    for(step=1; step<=10000000; step++)
    {
        emt_comm();
        MPI_Barrier(ProcInfo.CommCAL);
    }
    open_inter();
}


int main(int argc,char *argv[])
{
    ......
    isCalculationOver = 0;
    set_thread_affinity(0);
    MPI_Init(&argc, &argv);
    MPI_Comm_rank( MPI_COMM_WORLD, &ProcInfo.Id);
    MPI_Comm_size( MPI_COMM_WORLD, &ProcInfo.NumProc);
    core = ProcInfo.Id+1;
    MPI_Barrier(MPI_COMM_WORLD);
    ......
    pthread_create(&thread, NULL, thread_emt, &core);
    ......
    while(1 != isCalculationOver)
        usleep(100*1000);

    MPI_Barrier(MPI_COMM_WORLD);
    MPI_Finalize();

    return 0;
}

 


Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>