Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

Help with Assertion failed in file ../../dapl_conn_rc.c

$
0
0

I am trying to run a Fortran MPI based code (Incompact3d) on a cluster. The code works fine in local execution (i7) or in a single node (dual Xeon) handling even agressive optimization options like -fast. In our cluster, I can only make it work with gcc+openmpi. Intel does not work.

e.g. with gcc 4.6.3 and mpif90 -O3 -funroll-loops -ftree-vectorize -cpp -march=native -g -fbacktrace -ffast-math and mpirun -machinefile nodefile -np 96 ./incompact3d. Works, but is very slow.

Some info about Intel installation and the cluster...

$mpirun --version
Intel(R) MPI Library for Linux* OS, Version 4.1.0 Build 20120831
Copyright (C) 2003-2012, Intel Corporation. All rights reserved.

$mpiifort --version
ifort (IFORT) 13.0.1 20121010
Copyright (C) 1985-2012 Intel Corporation.  All rights reserved.

$uname -a
Linux cerrado01n 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

$ofed_info > ofed_info (attached file)

$rpm -qa | grep dapl -> no output

$ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 192379
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 32768
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 192379
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

$ mpirun -genvall -genv I_MPI_DEBUG 5 -genv I_MPI_HYDRA_DEBUG 1 -genv I_MPI_FABRICS=shm:dapl -machinefile ./nodes -n 96 ./incompact3d > log (attached file)

unexpected disconnect completion event from [27:cerrado02n]
Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0
internal ABORT - process 28

Am I doing something wrong? Have no clue.

Thanks in advance.

Fichier attachéTaille
Téléchargerapplication/octet-streamlog.f9027.14 Ko
Téléchargerapplication/octet-streamofed_info.f9013.12 Ko

Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>