Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

MPI hangs with multiple processes on same node - Intel AI DevCloud

$
0
0

Hello,

I am using Intel AI DevCloud to run a Deep Reinforcement Learning training, using mpi4py to use several agents to collect data at the same time.

In my framework, I run N jobs (in different nodes) with agents and another job with the optimization algorithm in python.

When I run a single agent in each job, the application works correctly. However, when I try to run more than one agent in the same job (same node), the application hangs.

I do not think the problem is the application itself because it works when there is a single agent per node. Additionally, the same application used to work with multiple agents per node last year, when Intel DevCloud was CentOS.

The application is in this code: https://github.com/alexandremuzio/baselines/blob/71977df495e7840179dd05e...

 

Probably I am not presenting enough data about the problem, so if you need any information regarding MPI stuff I can send in this post.

 


Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>