Question

我正在根据本教程使用PyTorch对Faster-RCNN进行微调：https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html

结果相当不错，但仅在将单个张量馈送到模型时才能进行预测。例如：

# This works well
>>> img, _ = dataset_test[3]
>>> img.shape
torch.Size([3, 1200, 1600])
>>> model.eval()
>>> with torch.no_grad():
    .. preds = model([img.to(device)])

但是当我一次输入多个张量时，我得到了这个错误：

>>> random_idx = torch.randint(high=50, size=(4,))
>>> images = torch.stack([dataset_test[idx][0] for idx in random_idx])
>>> images.shape
torch.Size([4, 3, 1200, 1600])
>>> with torch.no_grad():
    .. preds = model(images.to(device))
RuntimeError                              Traceback (most recent call last)
<ipython-input-101-52caf8fee7a4> in <module>()
      5 model.eval()
      6 with torch.no_grad():
----> 7   prediction =  model(images.to(device))

...

RuntimeError: The expanded size of the tensor (1600) must match the existing size (1066) at non-singleton dimension 2.  Target sizes: [3, 1200, 1600].  Tensor sizes: [3, 800, 1066]

编辑

在输入3D张量列表时有效（IMO这种行为有点奇怪，我不明白为什么它不能与4D张量一起使用）

>>> random_idx = torch.randint(high=50, size=(4,))
>>> images = [dataset_test[idx][0].to(device) for idx in random_idx]
>>> images.shape
torch.Size([4, 3, 1200, 1600])
>>> with torch.no_grad():
    .. preds = model(images)

Answer 1

MaskRCNN期望在训练模式下将张量列表作为“输入图像”，并将字典列表作为“目标”。这种特殊的设计选择是由于每个图像可以具有可变数量的对象，即每个图像的目标张量将具有可变的尺寸，因此我们被迫使用列表代替目标的张量。

但是，仍然不是完全需要使用图像张量列表来代替批处理张量。我的猜测是，为了保持一致性，它们也附带了一张张量图像。而且，这具有能够使用可变大小的图像作为输入而不是固定大小的图像的附加优点。

由于这种特定的设计选择，模型还希望在评估模式期间输入张量列表。

至于模型的速度性能，这种设计选择可能会对评估产生负面影响，但是我不能百分百地相信。但是，在训练过程中，由于每个图像的目标张量维数都不同，因此我们被迫逐一遍历所有图像以进行损耗计算。因此，在训练过程中在图像张量列表上使用批处理图像张量将不会提高速度。

图像尺寸预测时的快速RCNN Pytorch问题

编辑

1 个答案: