Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

MPI generates numerous SCIF/scif_connect failure warning

$
0
0

 

I am running a heterogeneous job - on host and xeon phi coprocessor.  If I run the mpi job on just the host or just the card everything is smooth.  When I split the job between the host and the xeon phi card - the mpi run completes successfully, but it generates numerous warning messages and is quite noisy.   If the messages were meaningless - I would expect them not to be printed.  They are not fatal as the MPI messages all complete and the job completes too.   So what are the messages supposed to be warning me to do to improve the mpi environment?  I am running intel mpi 5.  The error messages are like this (my system is named delphi, the mic card is named mic0).   delphi-mic0:SCM:3a20:70677b80: 228 us(228 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a21:1ba1eb80: 231 us(231 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a20:70677b80: 220 us(220 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a21:1ba1eb80: 232 us(232 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:CMA:3a20:70677b80: 570 us(570 us):  open_hca: getaddr_netdev ERROR:No such device. Is ib0 configured?
delphi-mic0:CMA:3a21:1ba1eb80: 808 us(808 us):  open_hca: getaddr_netdev ERROR:No such device. Is ib0 configured?
delphi-mic0:CMA:3a20:70677b80: 554 us(554 us):  open_hca: getaddr_netdev ERROR:No such device. Is ib1 configured?
delphi-mic0:CMA:3a21:1ba1eb80: 583 us(583 us):  open_hca: getaddr_netdev ERROR:No such device. Is ib1 configured?
delphi-mic0:SCM:3a20:70677b80: 221 us(221 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a21:1ba1eb80: 475 us(475 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a20:70677b80: 219 us(219 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a21:1ba1eb80: 459 us(459 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a20:70677b80: 221 us(221 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a21:1ba1eb80: 430 us(430 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a20:70677b80: 222 us(222 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a21:1ba1eb80: 403 us(403 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a20:70677b80: 218 us(218 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a21:1ba1eb80: 216 us(216 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:CMA:3a20:70677b80: 559 us(559 us):  open_hca: getaddr_netdev ERROR:No such device. Is eth2 configured?
delphi-mic0:CMA:3a21:1ba1eb80: 729 us(729 us):  open_hca: getaddr_netdev ERROR:No such device. Is eth2 configured?
delphi-mic0:UCM:3a20:70677b80: 207 us(207 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a21:1ba1eb80: 193 us(193 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a20:70677b80: 200 us(200 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a21:1ba1eb80: 200 us(200 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a20:70677b80: 218 us(218 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a21:1ba1eb80: 303 us(303 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a20:70677b80: 198 us(198 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a21:1ba1eb80: 232 us(232 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:CMA:3a21:1ba1eb80: 571 us(571 us):  open_hca: getaddr_netdev ERROR:No such device. Is eth2 configured?
delphi-mic0:CMA:3a20:70677b80: 681 us(681 us):  open_hca: getaddr_netdev ERROR:No such device. Is eth2 configured?
delphi-mic0:CMA:3a20:70677b80: 598 us(598 us):  open_hca: getaddr_netdev ERROR:No such device. Is eth3 configured?
delphi-mic0:CMA:3a21:1ba1eb80: 818 us(818 us):  open_hca: getaddr_netdev ERROR:No such device. Is eth3 configured?
delphi-mic0:SCM:3a20:70677b80: 224 us(224 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a21:1ba1eb80: 223 us(223 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a20:70677b80: 226 us(226 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a21:1ba1eb80: 231 us(231 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a20:70677b80: 225 us(225 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a21:1ba1eb80: 229 us(229 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a20:70677b80: 195 us(195 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a21:1ba1eb80: 200 us(200 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:CMA:3a20:70677b80: 564 us(564 us):  open_hca: getaddr_netdev ERROR:Cannot assign requested address. Is mic0:ib configured?
delphi-mic0:CMA:3a21:1ba1eb80: 621 us(621 us):  open_hca: getaddr_netdev ERROR:Cannot assign requested address. Is mic0:ib configured?
delphi-mic0:SCM:3a20:70677b80: 227 us(227 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a21:1ba1eb80: 221 us(221 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a20:70677b80: 261 us(261 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a21:1ba1eb80: 249 us(249 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a20:70677b80: 256 us(256 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a21:1ba1eb80: 262 us(262 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a20:70677b80: 306 us(306 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a21:1ba1eb80: 327 us(327 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a20:70677b80: 211 us(211 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a21:1ba1eb80: 213 us(213 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a20:70677b80: 199 us(199 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a21:1ba1eb80: 193 us(193 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a20:70677b80: 226 us(226 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a21:1ba1eb80: 243 us(243 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a20:70677b80: 245 us(245 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a21:1ba1eb80: 265 us(265 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a20:70677b80: 219 us(219 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a21:1ba1eb80: 264 us(264 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a20:70677b80: 222 us(222 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a21:1ba1eb80: 258 us(258 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a20:70677b80: 201 us(201 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a21:1ba1eb80: 236 us(236 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a20:70677b80: 239 us(239 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a21:1ba1eb80: 257 us(257 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a20:70677b80: 221 us(221 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a20:70677b80: 211 us(211 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a21:1ba1eb80: 238 us(238 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:UCM:3a21:1ba1eb80: 296 us(296 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a20:70677b80: 225 us(225 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a20:70677b80: 227 us(227 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a20:70677b80: 223 us(223 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a21:1ba1eb80: 279 us(279 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a20:70677b80: 232 us(232 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:SCM:3a21:1ba1eb80: 273 us(273 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:MCM:3a20:70677b80: 638 us(638 us): scif_connect() to port 68, failed with error Connection refused
delphi-mic0:MCM:3a20:70677b80: 731 us(93 us):  open_hca: SCIF init ERR on qib0
delphi-mic0:SCM:3a21:1ba1eb80: 267 us(267 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:MCM:3a20:70677b80: 667 us(667 us): scif_connect() to port 68, failed with error Connection refused
delphi-mic0:MCM:3a20:70677b80: 755 us(88 us):  open_hca: SCIF init ERR on qib0
delphi-mic0:SCM:3a21:1ba1eb80: 231 us(231 us):  open_hca: ibv_get_device_list() failed
delphi-mic0:MCM:3a20:70677b80: 845 us(845 us): scif_connect() to port 68, failed with error Connection refused
delphi-mic0:MCM:3a20:70677b80: 932 us(87 us):  open_hca: SCIF init ERR on qib1
delphi-mic0:MCM:3a21:1ba1eb80: 829 us(829 us): scif_connect() to port 68, failed with error Connection refused
delphi-mic0:MCM:3a21:1ba1eb80: 955 us(126 us):  open_hca: SCIF init ERR on qib0
delphi-mic0:MCM:3a20:70677b80: 555 us(555 us): scif_connect() to port 68, failed with error Connection refused
delphi-mic0:MCM:3a20:70677b80: 632 us(77 us):  open_hca: SCIF init ERR on qib1
delphi-mic0:MCM:3a21:1ba1eb80: 566 us(566 us): scif_connect() to port 68, failed with error Connection refused
delphi-mic0:MCM:3a21:1ba1eb80: 672 us(106 us):  open_hca: SCIF init ERR on qib0
delphi-mic0:MCM:3a21:1ba1eb80: 862 us(862 us): scif_connect() to port 68, failed with error Connection refused
delphi-mic0:MCM:3a21:1ba1eb80: 963 us(101 us):  open_hca: SCIF init ERR on qib1
delphi-mic0:MCM:3a21:1ba1eb80: 842 us(842 us): scif_connect() to port 68, failed with error Connection refused
delphi-mic0:MCM:3a21:1ba1eb80: 950 us(108 us):  open_hca: SCIF init ERR on qib1
 

 

 

 

 


Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>