更改batch_size时,MultiprocessIterator引发错误

时间:2019-02-15 11:57:01

标签: python python-multiprocessing chainer chainercv

我想用ChainerCV训练Faster R-CNN。作为第一个测试,我主要复制了提供的example,仅更改了与数据集对应的lines以使用我的自定义数据集。我检查了this tutorial中描述的所有操作,我的数据集是否功能齐全。

如果我在不进行任何更改的情况下运行脚本,则一切正常,但是如果更改batch_size,则会收到错误消息。我尝试将shared_mem从100 MB增加到1000 MB,但是错误没有消失。

设置batch_size = 2时出错:

Exception in main training loop: all the input array dimensions except for the concatenation axis must match exactly
Traceback (most recent call last):
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/training/trainer.py", line 315, in run
    update()
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 165, in update
    self.update_core()
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 171, in update_core
    in_arrays = self.converter(batch, self.device)
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/dataset/convert.py", line 134, in concat_examples
    [example[i] for example in batch], padding[i])))
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/dataset/convert.py", line 164, in _concat_arrays
    return xp.concatenate([array[None] for array in arrays])
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
  File "/home/cv/ChainerCV/faster_rcnn/train.py", line 131, in <module>
    main()
  File "/home/cv/ChainerCV/faster_rcnn/train.py", line 126, in main
    trainer.run()
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/training/trainer.py", line 329, in run
    six.reraise(*sys.exc_info())
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/training/trainer.py", line 315, in run
    update()
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 165, in update
    self.update_core()
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 171, in update_core
    in_arrays = self.converter(batch, self.device)
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/dataset/convert.py", line 134, in concat_examples
    [example[i] for example in batch], padding[i])))
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/dataset/convert.py", line 164, in _concat_arrays
    return xp.concatenate([array[None] for array in arrays])
ValueError: all the input array dimensions except for the concatenation axis must match exactly

系统信息:

__Hardware Information__
Machine                                       : x86_64
CPU Name                                      : skylake
Number of accessible CPU cores                : 8

__OS Information__
Platform                                      : Linux-4.15.0-45-generic-x86_64-with-debian-stretch-sid
Release                                       : 4.15.0-45-generic
System Name                                   : Linux
Version                                       : #48~16.04.1-Ubuntu SMP Tue Jan 29 18:03:48 UTC 2019
OS specific info                              : debianstretch/sid
glibc info                                    : glibc 2.10

__CUDA Information__
Found 1 CUDA devices
id 0     b'GeForce GTX 1080'                              [SUPPORTED]
                      compute capability: 6.1
                           pci device id: 0
                              pci bus id: 1
Summary:
    1/1 devices are supported
CUDA driver version                           : 10000

__Conda Information__
conda_build_version                           : 3.17.6
conda_env_version                             : 4.6.3
platform                                      : linux-64
python_version                                : 3.7.1.final.0

编辑:以batch_size = 2运行example时,也会发生错误。

3 个答案:

答案 0 :(得分:2)

在尝试纠正错误时,我得到了另一个error

ValueError: Currently only batch size 1 is supported.

等待似乎是解决方案。

答案 1 :(得分:1)

当前的Faster-RCNN实现不支持多批训练,但是您可以像下面的代码一样重写它以支持它。 https://github.com/knorth55/chainer-light-head-rcnn/blob/master/light_head_rcnn/links/model/light_head_rcnn_train_chain.py

另一个选择是在ChainerCV中将Faster-RCNN与FPN一起使用。 ChainerCV的最新版本具有带有FPN的Faster-RCNN,它支持多批次训练。 https://github.com/chainer/chainercv/blob/master/examples/fpn/train_multi.py

答案 2 :(得分:0)

self.converter假定batch的第一个参数由形状相同的输入组成。例如,如果您使用图像数据集,则所有图像都应具有(C,H,W)的形状。

那么,您可以检查数据集是否返回相同形状的图像? 而且,如果您的数据集具有各种形状的图像,那么如何像https://github.com/chainer/chainercv/blob/df63b74ef20f9d8c830e266881e577dd05c17442/examples/faster_rcnn/train.py#L86一样使用TransformDataset