所以,答案可能是“两个网络不相同”,并且有一些明显我缺失的东西,但在我的理解中他们应该做同样的事情,但中间输出的维度是不一样的。我的主要问题是pytorch ConvTranspose3d
和Conv3d
。特别是,我有一个形状为(1,44,68,120)
的输入,其中44是深度尺寸,而68和120是宽度和高度。我使用strided conv / convtranspose向下/向上尺寸。不应该这样:
self.conv3d1_1 = nn.Conv3d(in_channels=1, out_channels=32, groups=1, stride=(2,2,2), kernel_size=3, padding=1)
在输出尺寸方面相当于这两层?
self.conv3d1_1 = nn.Conv3d(in_channels=1, out_channels=32, groups=1, stride=(2,1,1), kernel_size=3, padding=1)
self.conv3d1_2 = nn.Conv3d(in_channels=32, out_channels=32, groups=1, stride=(1,2,2), kernel_size=3, padding=1)
第一层同时将所有尺寸减半,而第二块首先减小时间尺寸,然后减小空间尺寸?
所以,第一个网络:
self.conv3d1_1 = nn.Conv3d(in_channels=1, out_channels=32, groups=1, stride=(2,2,2), kernel_size=3, padding=1) # depthwise convolution (1 ch, 44 depth dimension, h, w)
self.conv3d1_2 = nn.Conv3d(in_channels=32, out_channels=64, groups=1, kernel_size=3, padding=1)
self.conv3d2_1 = nn.Conv3d(in_channels=64, out_channels=128, groups=1, stride=(2, 2, 2), kernel_size=3, padding=1)
self.conv3d2_2 = nn.Conv3d(in_channels=128, out_channels=128, groups=1, kernel_size=3, padding=1)
self.conv3d3_1 = nn.ConvTranspose3d(in_channels=128, out_channels=128, stride=(2, 2, 2), groups=1, kernel_size=4, padding=(1,1,1))
self.conv3d3_2 = nn.Conv3d(in_channels=128, out_channels=128, groups=1, kernel_size=3, padding=1)
self.conv3d4_1 = nn.ConvTranspose3d(in_channels=128, out_channels=128, stride=(2, 2, 2), groups=1, kernel_size=4, padding=1)
self.conv3d4_2 = nn.Conv3d(in_channels=128, out_channels=64, groups=1, kernel_size=3, padding=1)
产生这些中间维度:
torch.Size([2, 1, 44, 68, 120])
torch.Size([2, 32, 22, 34, 60])
torch.Size([2, 64, 22, 34, 60])
torch.Size([2, 128, 11, 17, 30])
torch.Size([2, 128, 11, 17, 30])
torch.Size([2, 128, 22, 34, 60])
torch.Size([2, 128, 22, 34, 60])
torch.Size([2, 128, 44, 68, 120])
torch.Size([2, 64, 44, 68, 120])
一切都很好。
第二个网络(维度缩短只有1个减少,但下降2次时会发生相同和最差(所以4倍))
elf.conv3d1_1 = nn.Conv3d(in_channels=1, out_channels=32, groups=1, stride=(2,1,1), kernel_size=3, padding=1) # depthwise convolution (1 ch, 44 depth dimension, h, w)
self.conv3d1_2 = nn.Conv3d(in_channels=32, out_channels=64, groups=1, kernel_size=3, padding=1)
self.conv3d2_1 = nn.Conv3d(in_channels=64, out_channels=128, groups=1, stride=(1, 2, 2), kernel_size=3, padding=1)
self.conv3d2_2 = nn.Conv3d(in_channels=128, out_channels=128, groups=1, kernel_size=3, padding=1)
self.conv3d3_1 = nn.ConvTranspose3d(in_channels=128, out_channels=128, stride=(1, 2, 2), groups=1, kernel_size=4, padding=(1,1,1))
self.conv3d3_2 = nn.Conv3d(in_channels=128, out_channels=128, groups=1, kernel_size=3, padding=1)
self.conv3d4_1 = nn.ConvTranspose3d(in_channels=128, out_channels=128, stride=(2, 1, 1), groups=1, kernel_size=4, padding=1)
self.conv3d4_2 = nn.Conv3d(in_channels=128, out_channels=64, groups=1, kernel_size=3, padding=1)
这些是中间维度:
torch.Size([2, 1, 44, 68, 120])
torch.Size([2, 32, 22, 68, 120])
torch.Size([2, 64, 22, 68, 120])
torch.Size([2, 128, 22, 34, 60])
torch.Size([2, 128, 22, 34, 60])
torch.Size([2, 128, 23, 68, 120])
torch.Size([2, 128, 23, 68, 120])
torch.Size([2, 128, 46, 69, 121])
torch.Size([2, 64, 46, 69, 121])
出于某种原因,第一个ConvTranspose3d
增加了时间维度,而它应该只处理空间维度?我最初认为这是一个填充问题,但更改填充不能解决问题。时间ConvTranspose3d
也是如此,它将空间维度增加了1。
任何线索?提前谢谢。