RuntimeError:无法打开共享内存对象,OSError:[Errno 24]打开的文件太多

时间:2018-08-08 07:37:30

标签: python memory pytorch

我在加载文档索引时遇到麻烦。 我正在测试我的代码,所以我设置了

batch_size = 4
number_of_sentences_in_document = 84
number_of_words_in_sentence = 80

将一个mini_batch汇总为80 * 84 * 4个文档索引。

问题在于,当我将这些索引数据集转换为如下所示的DataLoader时 并尝试遍历Trainloader,结果产生了很多错误消息。

DataManager = DS.NewsDataset(data_examples_gen, Vocab)
trainloader = torch.utils.data.DataLoader(DataManager, batch_size=Args.args.batch_size, shuffle=True, num_workers=32)

下面是错误消息。

Traceback (most recent call last):   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 61, in _worker_loop
    data_queue.put((idx, samples))   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/queues.py", line 341, in put   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 125, in reduce_storage   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/reduction.py", line 191, in DupFd   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/resource_sharer.py", line 48, in __init__ OSError: [Errno 24] Too many open files

在处理上述异常期间,发生了另一个异常:

Traceback (most recent call last):   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/util.py", line 186, in __call__   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/shutil.py", line 476, in rmtree   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/shutil.py", line 474, in rmtree OSError: [Errno 24] Too many open files: '/tmp/pymp-be4nmgxw' Process Process-2: Traceback (most recent call last):   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 61, in _worker_loop
    data_queue.put((idx, samples))   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/queues.py", line 341, in put   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 125, in reduce_storage   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/reduction.py", line 191, in DupFd   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/resource_sharer.py", line 48, in __init__ OSError: [Errno 24] Too many open files Traceback (most recent call last):   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 61, in _worker_loop
    data_queue.put((idx, samples))   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/queues.py", line 341, in put   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 121, in reduce_storage RuntimeError: unable to open shared memory object </torch_54415_3383444026> in read-write mode at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/TH/THAllocator.c:342

在处理上述异常期间,发生了另一个异常:

Traceback (most recent call last):   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/util.py", line 186, in __call__   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/shutil.py", line 476, in rmtree   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/shutil.py", line 474, in rmtree OSError: [Errno 24] Too many open files: '/tmp/pymp-abguy87b' Process Process-1: Traceback (most recent call last):   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 61, in _worker_loop
    data_queue.put((idx, samples))   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/queues.py", line 341, in put   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 121, in reduce_storage RuntimeError: unable to open shared memory object </torch_54415_3383444026> in read-write mode at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/TH/THAllocator.c:342 Traceback (most recent call last):   File "/home/nlpgpu3/LinoHong/FakeNewsByTitle/main.py", line 26, in <module>
    for mini_batch in trainloader :   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 280, in __next__
    idx, batch = self._get_batch()   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 259, in _get_batch
    return self.data_queue.get()   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/queues.py", line 335, in get
    res = self._reader.recv_bytes()   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)   File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 178, in handler
    _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 54416) exited unexpectedly with exit code 1.

以退出代码1完成的过程

我认为这是某种内存问题,所以我尝试了同样的事情 一个文档只有两个句子,它起作用了。 但是,我希望随着 batch_size最多为32或64, 每个文档的句子数最多为84 每个句子的单词数最多为84。

我尝试了

$ ulimit -n 10000

但是那个没有用。

如何解决此问题? 有任何想法吗?

0 个答案:

没有答案