Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

My MPI program doesn't work (hangs) when you launch processes on different nodes (hosts)

$
0
0

My MPI program doesn't work (hangs) when you launch processes on different nodes (hosts). In my program I use MPI_Win_allocate_shared function to allocate shared memory using RMA window. And I'm wondering what is the possible cause why my program doesn't work. Do I actually need to implement intercommunicators for that purpose? Here's the code:

MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, proc_rank, MPI_INFO_NULL, &comm_sm);
MPI_Comm_rank(comm_sm, &rank_sm);
MPI_Comm_size(comm_sm, &numprocs_sm);

MPI_Info info_noncontig;
MPI_Info_create(&info_noncontig);
MPI_Info_set(info_noncontig, "alloc_shared_noncontig", "true");

int disp_size = sizeof(ullong);
MPI_Aint array_size = number_of_items * disp_size;
MPI_Win_allocate_shared(array_size, disp_size, info_noncontig, comm_sm, &array, &win_sm);
MPI_Win_shared_query(win_sm, 0, &array_size, &disp_size, &array);

MPI_Barrier(comm_sm);

ullong i_start = proc_rank * number_of_items / (ullong)numprocs;
ullong i_end = (proc_rank + 1) * number_of_items / (ullong)numprocs;

MPI_Win_lock_all(MPI_MODE_NOCHECK, win_sm);

if (proc_rank == 0)
{
 ullong value = number_of_items - 1;
 srand((unsigned)time(NULL) + proc_rank * numprocs + namelen);
 for (ullong index = 0; index < number_of_items; index++, value--)
  array[index] = (rand_mode == 1) ? rand() % rand_seed + 1 : value;

}

MPI_Barrier(comm_sm);

for (ullong index = i_start; index <= i_end; index++)
fprintf(stdout, "%llu ", array[index]);

fprintf(stdout, "\n\n");
fflush(stdout);

MPI_Barrier(comm_sm);

Output:

[COMP-PC.MYHOME.NET@mpiexec] Process 0 of 2
71 81 12 56 66 49 70 39 100 90 27 57 46 66 6 13 39 20 70 4 6 13 16 5
 56 60 90 44 97 5 87 51 44 12 7 54 70 5 29 65 95 69 70 44 45 38 87 1 9 80 54 78
67 77 68 13 16 78 79 40 98 50 74 6 52

[WIN-9MFH3O78GLQ.MYHOME.NET@mpiexec] Process 1 of 2
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

As you can see the process 1 doesn't receive the array buffer address ?!?!?!?!?!


Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>