在pytorch中同时遍历两个数据加载器时发生内存错误

时间:2019-09-11 11:30:54

标签: out-of-memory pytorch dataloader

我正在尝试使用来自2个不同数据集中的2个数据加载器来训练我的模型。

我发现了如何使用cycle() and zip()进行设置,因为我的数据集的长度与此处不同:How to iterate over two dataloaders simultaneously using pytorch?

  File "/home/Desktop/example/train.py", line 229, in train_2
    for i, (x1, x2) in enumerate(zip(cycle(train_loader_1), train_loader_2)):
  File "/home/.conda/envs/3dcnn/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 346, in __next__
    data = self.dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/.conda/envs/3dcnn/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "/home/.conda/envs/3dcnn/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 80, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "/home/.conda/envs/3dcnn/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 80, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "/home/.conda/envs/3dcnn/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 154140672 bytes. Error code 12 (Cannot allocate memory)

我试图通过设置num_workers=0,使用pinned_memory=Falseshuffle=False来减小批量大小来解决此问题... 但是这些都不起作用……我拥有256GB的RAM和4个NVIDIA TESLA V100 GPU。

我试图通过不同时在2个数据加载器中进行训练而单独运行单独地运行它,但它确实有效。但是对于我的项目,我需要使用2个数据集进行并行训练...

1 个答案:

答案 0 :(得分:2)

基于this的讨论,我通过使用以下方法避免了任何错误,而不是<Authorization:identifier>cycle()

zip()

在此pytorch帖子中对@ srossi93表示敬意!