Dear all,
I want to benchmark an implementation in cluster architectures. The number of processes need not be high, however I need as much memory as possible on each MPI process. For example, I have access to a large cluster where each node has two sockets and each socket has 6 multithreaded cores. Assume that I launch one MPI process per node (or per socket). Can I make this particular MPI process to access the entire node's (socket's) memory? Right now I can launch a single MPI process per socket but the amount of memory that the process sees is only the that of a single core.
Thank you!