Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

Scalapack raise error under certain circumstance

$
0
0

Dear All,

      I am using IntelMPI + ifort + MKL to compile Quantum-Espresso 6.1. Everthing works fine except invoking scalapack routines. Calls to PDPOTRF may exit with non-zero error code under certain circumstance. In an example, with 2 nodes * 8 processors per node the program works but with 4 nodes * 4 processors per node the program fails. If I_MPI_DEBUG is used,  for the failed case there are following messages just before the call exit with code 970, while for the working case there is no such messages:

[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676900, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675640, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x26742b8, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676b58, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x26769c8, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676c20, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675fa0, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676068, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676a90, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676e78, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2678778, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675898, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675a28, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675bb8, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2674f38, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676ce8, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2676130, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2674768, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2674448, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2674b50, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675e10, operation 2, size 12272, lkey 1879682311
[10#18754:18754@node09] MRAILI_Ext_sendq_send(): rail 0,vbuf 0x2675708, operation 2, size 2300, lkey 1879682311

         Could you provide any suggestion about what is the possible cause here? Thanks very much.

Feng


Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>