When running a workload on multiple nodes with mpiexec.hydra, the entire run aborts when even one node fails/shutsdown. I want to detect if the failure is due to node disconnection or something else. Trying to print out the exit code with "-print-all-exitcodes" does't seem to work
Is there any other option?