From the MPI 3.1 specification:
This is a collective call executed by all processes in the group of comm. On each
process, it allocates memory of at least size bytes that is shared among all processes in
comm, and returns a pointer to the locally allocated segment in baseptr that can be used
for load/store accesses on the calling process. The locally allocated memory can be the
target of load/store accesses by remote processes; the base pointers for other processes
can be queried using the function MPI_WIN_SHARED_QUERY. The call also returns a
window object that can be used by all processes in comm to perform RMA operations.
The size argument may be di erent at each process and size = 0 is valid. It is the user's
responsibility to ensure that the communicator comm represents a group of processes that
can create a shared memory segment that can be accessed by all processes in the group.
On a single SMP host with multiple ranks it is clear that you can use this to construct a window to a multi-process shared memory buffer that can be accessed (with care) either with direct load/store instructions or by way of RMA operations. Note, each rank/process may have a different virtual address base for the baseptr.
From the MPI 3.1 specification it is stated (implied) that the group of comm must have the capability to access the same physical memory (which may be mapped at different virtual addresses in different processes).
Now as a simplification of my query, consider the situation of say 8 processes running on 2 hosts, 4 processes per host (and the hosts do not have sharable memory between them).
Can all 8 processes issue MPI_WIN_ALLOCATE_SHARED using MPI_COMM_WORLD returning 8 win objects ( 4 per host) with:
4 processes on host 0 having shared memory (and direct access by those processes)
4 processes on host 1 having shared memory (different from host 0, and direct access by those processes)
All 8 processes having RMA access to all processes win window.
What I wish to do is to improve the performance with intra-host access without excluding inter-host access (and not having each process using 2 windows to do this).
Note, I am not currently setup to make this test.
Jim Dempsey