Hi,
I am looking to run IntelMPI over a RDMA capable fabric. I was wondering what is the usual way that this setup runs (since I am still acquiring IntelMPI).
1. Does IntelMPI run with one process per CPU (or hyperthread) ? Or does it runs with one thread per CPU (or hyperthread) ?
2. How many the queue pairs does each machine consume ? For example, if I have M machines in the cluster and each machine has N cores, does each machine need
(a) M - 1 queue-pairs (one QP per machine to talk to M - 1 machines in the cluster) or
(b) N * (M - 1) queue pairs (one QP per local core to talk to M - 1 machines in the cluster or
(c) N * N * (M - 1) queue pairs (one QP per local core to talk to N * (M - 1) cores in the cluster).
3. By default, does IntelMPI use UD more or RC mode ?
Thanks
~Neelesh