Question

首先，我是这些主题的新手。我的神经网络在训练和验证集上将所有内容分类为人或背景。培训集 VOC2011 。

#Defining model
from keras_fcn import FCN
fcn_vgg16 = FCN(input_shape=(500, 500, 3), classes=21)
fcn_vgg16.load_weights('fcn_vgg16_weights.h5')


#Preprocessing to image
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
import numpy as np

img_path = 'catdog2.jpg'
img1 = image.load_img(img_path, target_size=(500, 500))
x = image.img_to_array(img1)
x = np.expand_dims(x, axis=0)
#x = preprocess_input(x)

predicted = fcn_vgg16.predict(x)

predicted = np.squeeze(predicted, axis=0)


#I may have completely misunderstood the visualization part
color_list = {0:[176, 23,  31],  1:[220, 20,  60],  2:[139,  71,  93],  3:[0,     9, 236],  3:[255,  20, 147], 
              4:[139,  0, 139],  5:[0,    0, 255],  6:[202, 225, 255],  7:[30,  144, 255],  8:[240, 248, 255],
              9:[0,  245, 255], 10:[0,  199, 140], 11:[0,   255, 127], 12:[139, 131, 134], 13:[255, 255,   0],
             14:[255,165,   0], 15:[255, 153, 18], 16:[255,  69,   0], 17:[255,   0,   0], 18:[0,     0,   0],
             19:[219,219, 219], 20:[0,  245, 255],255:[0,     0, 205]}
#*15:[255, 153, 18] => Orange*


"""
{0: 'background',
 1: 'aeroplane',
 2: 'bicycle',
 3: 'bird',
 4: 'boat',
 5: 'bottle',
 6: 'bus',
 7: 'car',
 8: 'cat',
 9: 'chair',
 10: 'cow',
 11: 'diningtable',
 12: 'dog',
 13: 'horse',
 14: 'motorbike',
 15: 'person',
 16: 'potted-plant',
 17: 'sheep',
 18: 'sofa',
 19: 'train',
 20: 'tv/monitor',
 255: 'ambigious'}
 """

#http://warmspringwinds.github.io/tensorflow/tf-slim/2017/01/23/fully-convolutional-networks-(fcns)-for-image-segmentation/


import scipy.misc as smp
#Create a 500x500x3 array of 8 bit unsigned integers
data = np.zeros((500, 500, 3), dtype=np.uint8)

for i in range(500):
    for j in range(500):
        data[i][j] = color_list[np.argmax(predicted[i][j])]

img2 = smp.toimage(data)


import matplotlib.pyplot as plt
f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
ax1.imshow(np.squeeze(img1, axis=0))
ax2.imshow(img2)
plt.show()

这是一种局部最小值（或马鞍点？），可以将所有内容预测为背景（最常见的标签）？

原因是什么，你有什么想法？还有什么方法可以实现语义分割的可视化吗？

我想训练有些不对劲。如果过度拟合，则损失总是减少，为什么结果与传输数据相同。

Answer 1

关于标签图像可视化的问题。您不需要使用imshow自己的颜色映射。您可以使用cmap关键字参数进行imshow，并选择定性颜色映射。

更多阅读：

示例：

plt.imshow(img2, cmap="Set1", interpolation="nearest")

Answer 2

关于将所有内容归类为背景的问题。

我假设所提供的图像从训练的早期进展到以后（可能来自训练集）。

一些建议：

最初使用加权成本函数进行训练，其权重与数据集中标签的比例相反。由于先前的力量，这应该超过任何最小/马鞍。
尝试使用正则化参数。

为什么我的神经网络将一切都识别为语义分割任务的人或背景？

2 个答案: