Hello
We have some dual-Xeon E5v3 hosts (running RHEL6) with a TrueScale HCA connected to a PCI slot close to socket #1. When running Intel MPI Benchmarks, I see better latency and bandwidth when binding to socket #0 (1.27us against 1.42us for socket #1), which looks wrong. Given that a significant part of the TrueScale network processing is done in software, I wonder if this could be caused by the locality of some software tasks/buffers. I see that qib0 interrupts go socket #0, but changing them to go to socket #1 doesn't seem to fix this strangeness. Is there something else to migrate? Any kernel module parameter to move drivers' buffers near socket #1 where they should be?
thanks
Brice