我在加载文档索引时遇到麻烦。 我正在测试我的代码,所以我设置了
batch_size = 4
number_of_sentences_in_document = 84
number_of_words_in_sentence = 80
将一个mini_batch汇总为80 * 84 * 4
个文档索引。
问题在于,当我将这些索引数据集转换为如下所示的DataLoader时 并尝试遍历Trainloader,结果产生了很多错误消息。
DataManager = DS.NewsDataset(data_examples_gen, Vocab)
trainloader = torch.utils.data.DataLoader(DataManager, batch_size=Args.args.batch_size, shuffle=True, num_workers=32)
下面是错误消息。
Traceback (most recent call last): File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run() File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs) File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 61, in _worker_loop
data_queue.put((idx, samples)) File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/queues.py", line 341, in put File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 125, in reduce_storage File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/reduction.py", line 191, in DupFd File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/resource_sharer.py", line 48, in __init__ OSError: [Errno 24] Too many open files
在处理上述异常期间,发生了另一个异常:
Traceback (most recent call last): File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/util.py", line 186, in __call__ File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/shutil.py", line 476, in rmtree File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/shutil.py", line 474, in rmtree OSError: [Errno 24] Too many open files: '/tmp/pymp-be4nmgxw' Process Process-2: Traceback (most recent call last): File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run() File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs) File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 61, in _worker_loop
data_queue.put((idx, samples)) File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/queues.py", line 341, in put File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 125, in reduce_storage File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/reduction.py", line 191, in DupFd File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/resource_sharer.py", line 48, in __init__ OSError: [Errno 24] Too many open files Traceback (most recent call last): File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run() File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs) File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 61, in _worker_loop
data_queue.put((idx, samples)) File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/queues.py", line 341, in put File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 121, in reduce_storage RuntimeError: unable to open shared memory object </torch_54415_3383444026> in read-write mode at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/TH/THAllocator.c:342
在处理上述异常期间,发生了另一个异常:
Traceback (most recent call last): File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/util.py", line 186, in __call__ File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/shutil.py", line 476, in rmtree File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/shutil.py", line 474, in rmtree OSError: [Errno 24] Too many open files: '/tmp/pymp-abguy87b' Process Process-1: Traceback (most recent call last): File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run() File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs) File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 61, in _worker_loop
data_queue.put((idx, samples)) File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/queues.py", line 341, in put File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 121, in reduce_storage RuntimeError: unable to open shared memory object </torch_54415_3383444026> in read-write mode at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/TH/THAllocator.c:342 Traceback (most recent call last): File "/home/nlpgpu3/LinoHong/FakeNewsByTitle/main.py", line 26, in <module>
for mini_batch in trainloader : File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 280, in __next__
idx, batch = self._get_batch() File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 259, in _get_batch
return self.data_queue.get() File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/queues.py", line 335, in get
res = self._reader.recv_bytes() File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength) File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4) File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining) File "/home/nlpgpu3/anaconda3/envs/linohong3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 178, in handler
_error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 54416) exited unexpectedly with exit code 1.
以退出代码1完成的过程
我认为这是某种内存问题,所以我尝试了同样的事情 一个文档只有两个句子,它起作用了。 但是,我希望随着 batch_size最多为32或64, 每个文档的句子数最多为84 每个句子的单词数最多为84。
我尝试了
$ ulimit -n 10000
但是那个没有用。
如何解决此问题? 有任何想法吗?