I'm using MPI to run processes that are nearly independent. They only talk at the very end, for an MPI_GATHER operation. My machine has a 4-core, 8-thread CPU. I run it with:
mpirun -n 101 ./a.out
When I do so, I see (from htop) that it utilises 100% of all the threads. How do I bind it to just the cores? (I tries '-map-by core')
Also, I see that all the processes seeem to be running concurrently (with ~ 3 - 8 % per process). Wouldn't it be more efficient if each process got 100% till each reaches the point of GATHERing ?