You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following error occurred, executingvim log.err:
backend, store=tcp_store, rank=world_rank, world_size=world_size
File "/share/home/HCI/liuyang/anaconda3/envs/zson/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 608, in init_process_group
_store_based_barrier(rank, store, timeout)
File "/share/home/HCI/liuyang/anaconda3/envs/zson/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 247, in _store_based_barrier
rank, store_key, world_size, worker_count, timeout
RuntimeError: Timed out initializing process group in store based barrier on rank: 0, for key: store_based_barrier_key:1 (world_size=4, worker_count=2, timeout=0:30:00)
Traceback (most recent call last):
File "run.py", line 91, in <module>
main()
File "run.py", line 38, in main
run_exp(**vars(args))
File "run.py", line 86, in run_exp
execute_exp(config, run_type)
File "run.py", line 69, in execute_exp
trainer.train()
File "/share/home/HCI/liuyang/anaconda3/envs/zson/lib/python3.7/contextlib.py", line 74, in inner
return func(*args, **kwds)
File "/share/home/HCI/liuyang/habitat-lab-challenge-2022/habitat_baselines/rl/ppo/ppo_trainer.py", line 709, in train
self._init_train()
File "/share/home/HCI/liuyang/habitat-lab-challenge-2022/habitat_baselines/rl/ppo/ppo_trainer.py", line 216, in _init_train
self.config.RL.DDPPO.distrib_backend
File "/share/home/HCI/liuyang/habitat-lab-challenge-2022/habitat_baselines/rl/ddppo/ddp_utils.py", line 266, in init_distrib_slurm
backend, store=tcp_store, rank=world_rank, world_size=world_size
File "/share/home/HCI/liuyang/anaconda3/envs/zson/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 608, in init_process_group
_store_based_barrier(rank, store, timeout)
File "/share/home/HCI/liuyang/anaconda3/envs/zson/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 247, in _store_based_barrier
rank, store_key, world_size, worker_count, timeout
RuntimeError: Timed out initializing process group in store based barrier on rank: 0, for key: store_based_barrier_key:1 (world_size=4, worker_count=2, timeout=0:30:00)
srun: error: gpu01: task 0: Exited with exit code 1
srun: error: gpu04: task 0: Exited with exit code 1
~
What shall I do?
The text was updated successfully, but these errors were encountered:
Error executing
sbatch scripts/imagenav-v1-hm3d-ovrl-rn50.sh
, my script file is as followsThe following error occurred, executing
vim log.err
:What shall I do?
The text was updated successfully, but these errors were encountered: