过去的一天,我一直在努力弄清楚如何使用多个GPU。从理论上讲,跨多个GPU并行化模型应该像用nn.DataParallel
包装模型一样容易。但是,我发现这对我不起作用。为了使用我能找到的最简单和规范的东西来证明这一点,我在Data Parallelism tutorial的一行中运行了代码。
我尝试了一切,从仅显示特定的GPU排列到CUDA到重新安装与CUDA相关的所有内容,但无法弄清楚为什么我不能使用多个GPU。有关我的机器的一些信息: 作业系统:Ubuntu 16.04 显卡:4 1080tis Pytorch版本:1.01 CUDA版本:10.0
错误代码如下:
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
<ipython-input-3-0f0d83e9ef13> in <module>
1 for data in rand_loader:
2 input = data.to(device)
----> 3 output = model(input)
4 print("Outside: input size", input.size(),
5 "output_size", output.size())
/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
--> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)
/usr/local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py in forward(self, *inputs, **kwargs)
141 return self.module(*inputs[0], **kwargs[0])
142 replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
--> 143 outputs = self.parallel_apply(replicas, inputs, kwargs)
144 return self.gather(outputs, self.output_device)
145
/usr/local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py in parallel_apply(self, replicas, inputs, kwargs)
151
152 def parallel_apply(self, replicas, inputs, kwargs):
--> 153 return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
154
155 def gather(self, outputs, output_device):
/usr/local/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py in parallel_apply(modules, inputs, kwargs_tup, devices)
73 thread.start()
74 for thread in threads:
---> 75 thread.join()
76 else:
77 _worker(0, modules[0], inputs[0], kwargs_tup[0], devices[0])
/usr/local/lib/python3.6/threading.py in join(self, timeout)
1054
1055 if timeout is None:
-> 1056 self._wait_for_tstate_lock()
1057 else:
1058 # the behavior of a negative timeout isn't documented, but
/usr/local/lib/python3.6/threading.py in _wait_for_tstate_lock(self, block, timeout)
1070 if lock is None: # already determined that the C code is done
1071 assert self._is_stopped
-> 1072 elif lock.acquire(block, timeout):
1073 lock.release()
1074 self._stop()
KeyboardInterrupt:
任何对此错误的见解将不胜感激。从我相对有限的系统和CUDA知识来看,它与某种锁定有关,但是我一生都无法找出解决方法。