我正在尝试通过CNN网络预测上千张图片,如下图所示。接下来,您可以看到CNN架构和一些细节。我想知道为什么网络学习不正确?如您所见,预测图像值(轮廓)与地面真实情况相差甚远(预测值是10倍以上)。似乎网络在边缘检测(低级功能)上表现不错,但在高级功能上却没有。
kernel = 3
num_filters = 12
batch_size = 128
lr = 1e-5
class Model(nn.Module):
def __init__(self, kernel, num_filters, res = ResidualBlock):
super(Model, self).__init__()
self.conv0 = nn.Sequential(
nn.Conv2d(4, num_filters, kernel_size = kernel*3,
padding = 4),
nn.BatchNorm2d(num_filters),
nn.ReLU(inplace=True))
self.conv1 = nn.Sequential(
nn.Conv2d(num_filters, num_filters*2, kernel_size = kernel,
stride=2, padding = 1),
nn.BatchNorm2d(num_filters*2),
nn.ReLU(inplace=True))
self.conv2 = nn.Sequential(
nn.Conv2d(num_filters*2, num_filters*4, kernel_size = kernel, stride=2, padding = 1),
nn.BatchNorm2d(num_filters*4),
nn.ReLU(inplace=True))
self.conv3 = nn.Sequential(
nn.Conv2d(num_filters*4, num_filters*8, kernel_size = kernel, stride=2, padding = 2),
nn.BatchNorm2d(num_filters*8),
nn.ReLU(inplace=True))
self.conv4 = nn.Sequential(
nn.Conv2d(num_filters*8, num_filters*16, kernel_size = kernel, stride=2, padding = 1),
nn.BatchNorm2d(num_filters*16),
nn.ReLU(inplace=True))
self.tsconv0 = nn.Sequential(
nn.ConvTranspose2d(num_filters*16, num_filters*8, kernel_size = kernel, padding =1),
nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True),
nn.ReLU(inplace=True),
nn.BatchNorm2d(num_filters*8))
self.tsconv1 = nn.Sequential(
nn.ConvTranspose2d(num_filters*8, num_filters*4, kernel_size = kernel, padding = 1),
nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True),
nn.ReLU(inplace=True),
nn.BatchNorm2d(num_filters*4))
self.tsconv2 = nn.Sequential(
nn.ConvTranspose2d(num_filters*4, num_filters*2, kernel_size = kernel, padding = 1),
nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True),
nn.ReLU(inplace=True),
nn.BatchNorm2d(num_filters*2))
self.tsconv3 = nn.Sequential(
nn.ConvTranspose2d(num_filters*2, num_filters, kernel_size = kernel, padding = 1),
nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True),
nn.ReLU(inplace=True),
nn.BatchNorm2d(num_filters))
self.tsconv4 = nn.Sequential(
nn.Conv2d(num_filters, 1, kernel_size = kernel*3, padding = 0, bias=False),
nn.ReLU(inplace=True))
def forward(self, x):
x0 = self.conv0(x) #([6, 600, 600])
# print(x0.shape)
x1 = self.conv1(x0) #([12, 300, 300])
# print(x1.shape)
x2 = self.conv2(x1) #([24, 150, 150])
# print(x2.shape)
x3 = self.conv3(x2) #([48, 76, 76])
# print(x3.shape)
x4 = self.conv4(x3) #([96, 38, 38])
# print(x4.shape)
x5 = self.tsconv0(x4) #([48, 76, 76])
# print(x5.shape)
x6 = self.tsconv1(x5) #([24, 152, 152])
# print(x6.shape)
x7 = self.tsconv2(x6) #([12, 304, 304)
# print(x7.shape)
x8 = self.tsconv3(x7) #([6, 608, 608])
# print(x8.shape)
x9 = self.tsconv4(x8) #([1, 600, 600])
# print(x9.shape)
return x9
答案 0 :(得分:1)
这是自动编码器,对吗?您只想重建输入图像?下图是输出,上图是地面真理? 然后,我建议您两件事: 首先,似乎您的体系结构并没有真正生成许多功能映射,而是生成了很多功能映射。您将要从6个特征映射到96个。通常在CNN中,您要从6个映射到512个。例如:
layer1: 6 - layer2: 64 - layer3: 128 - layer4: 256 - layer5: 512 ...
这可能是为什么它不能学习高级功能的原因,因为模型的尺寸并不高。您还可以尝试使瓶颈层的要素地图尺寸小于38,大约8-16。
第二件事: 如果您不使用瓶颈层,则可以添加跳过连接。因此,从编码器中提取一层或全部层,并将它们添加到具有相同尺寸的解码器层中。 希望对您有所帮助!