Hi,
I use MPI (with Infiniband's RDMA enabled) over mmaped region. The size of mmap region is larger than the physical memory size, so I expect TLB is updated often, which may incur 'undefined' behavior of Infiniband's RDMA. The application kills the kernel (kernel panic), which is absolutely not acceptible (User application code never incur it)
I suspect Infiniband's RDMA capability which bypasses translation buffer of CPU and accesses physical memory directly as one of the reasons for corrupting OS 'somehow'.
1. Can RDMA capability occur such a problem I described?
2. I tried with 'I_MPI_DAPL_TRANSLATION_CACHE' disabled, but the issue is not resolved. (I don't see any message saying 'I_MPI_DAPL_TRANSALCTION_CACHE=0' even with 'I_MPI_DEBUG=100', am I miss-using the env. vars?)
many thanks for all :D