Question

我正在尝试对cifar10数据集使用3d转换（只是为了好玩）。我看到我们通常输入的文档是5d张量（N，C，D，H，W）。我真的真的必须传递5维数据吗？

我对此表示怀疑的原因是3D卷积只是意味着我的转换在3个维度/方向上移动。因此，从技术上讲，我可以使用3d 4d 5d甚至100d张量，然后只要至少3d张量就可以正常工作。那不对吗？

我很快就尝试了一下，但确实给出了错误：

import torch


def conv3d_example():
    N,C,H,W = 1,3,7,7
    img = torch.randn(N,C,H,W)
    ##
    in_channels, out_channels = 1, 4
    kernel_size = (2,3,3)
    conv = torch.nn.Conv3d(in_channels, out_channels, kernel_size)
    ##
    out = conv(img)
    print(out)
    print(out.size())

##
conv3d_example()
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-29c73923cc64> in <module>
     15 
     16 ##
---> 17 conv3d_example()

<ipython-input-3-29c73923cc64> in conv3d_example()
     10     conv = torch.nn.Conv3d(in_channels, out_channels, kernel_size)
     11     ##
---> 12     out = conv(img)
     13     print(out)
     14     print(out.size())

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    491             result = self._slow_forward(*input, **kwargs)
    492         else:
--> 493             result = self.forward(*input, **kwargs)
    494         for hook in self._forward_hooks.values():
    495             hook_result = hook(self, input, result)

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py in forward(self, input)
    474                             self.dilation, self.groups)
    475         return F.conv3d(input, self.weight, self.bias, self.stride,
--> 476                         self.padding, self.dilation, self.groups)
    477 
    478 

RuntimeError: Expected 5-dimensional input for 5-dimensional weight 4 1 2 3, but got 4-dimensional input of size [1, 3, 7, 7] instead

交叉发布：

Answer 1

请考虑以下情形。您有一个3通道NxN图像。此图片在pytorch中的大小为3xNxN（暂时忽略批处理尺寸）。

假设您将此图像传递到2D卷积层，没有偏差，内核大小为5x5，填充为2，输入/输出通道分别为3和10。

当我们将此层应用于输入图像时，实际上发生了什么？

您可以这样想...

对于10个输出通道中的每一个，都有一个大小为3x5x5的内核。使用该内核将 3D 卷积应用于3xNxN输入图像，可以将其视为在第一维中未填充。卷积的结果是1xNxN特征图。

由于有10个输出层，因此10个3x5x5内核。应用所有内核后，将输出堆叠到单个10xNxN张量中。

实际上，从经典意义上讲，2D卷积层已经在执行3D卷积。

类似地，对于3D卷积层，它实际上进行了4D卷积，这就是为什么需要5维输入的原因。

如何在标准3通道图像上使用3D卷积？

1 个答案: