使用CNN进行图像回归

时间:2018-08-15 05:34:49

标签: tensorflow regression conv-neural-network tflearn

我的直接问题是,我尝试过的所有各种CNN回归模型总是返回相同(或非常相似)的值,而我试图找出原因。但是我愿意接受各种各样的建议。

我的数据集如下:

  • x:将64x64灰度图像排列成64 x 64 x n ndarray
  • y:介于0到1之间的值,每个值对应于一张图像(将此视为某种比例)
  • weather:从拍摄每张图像起的4个天气读数(环境温度,湿度,露点,气压)

目标是使用图像和天气数据来预测y。由于我正在处理图片,因此我认为CNN较为合适(请告诉我这里是否还有其他策略)。

据我了解,CNN最常用于分类任务-将它们用于回归是很不寻常的。但是从理论上讲,它应该没有太大不同,我只需要将损失函数更改为MSE / RMSE,将最后一个激活函数更改为线性(尽管在此S型可能更合适,因为y在0与0之间和1)。

我遇到的第一个障碍是试图弄清楚如何合并天气数据,自然的选择是将其合并到第一个完全连接的层中。我在这里找到了一个示例:How to train mix of image and data in CNN using ImageAugmentation in TFlearn

我遇到的第二个障碍是确定架构。通常,我只会选择一篇论文并复制其体系结构,但在CNN图像回归上找不到任何东西。因此,我尝试了一个具有3个卷积层和2个完全连接层的(非常简单的)网络,然后尝试了https://github.com/tflearn/tflearn/tree/master/examples

的VGGNet和AlexNet体系结构

现在我遇到的问题是,我尝试的所有模型都输出相同的值,即训练集的均值y。观察张量板,损失函数很快变平(大约25个历元之后)。你知道这是怎么回事吗?虽然我确实了解每个层在做什么的基础知识,但我对如何为特定的数据集或任务构建良好的架构没有直觉。

这里是一个例子。我正在从tflearn示例页面使用VGGNet:

tf.reset_default_graph()

img_aug = ImageAugmentation()
img_aug.add_random_flip_leftright()
img_aug.add_random_flip_updown()
img_aug.add_random_90degrees_rotation(rotations=[0, 1, 2, 3])

convnet = input_data(shape=[None, size, size, 1], 
                     data_augmentation=img_aug, 
                     name='hive')
weathernet = input_data(shape=[None, 4], name='weather')

convnet = conv_2d(convnet, 64, 3, activation='relu', scope='conv1_1')
convnet = conv_2d(convnet, 64, 3, activation='relu', scope='conv1_2')
convnet = max_pool_2d(convnet, 2, strides=2, name='maxpool1')

convnet = conv_2d(convnet, 128, 3, activation='relu', scope='conv2_1')
convnet = conv_2d(convnet, 128, 3, activation='relu', scope='conv2_2')
convnet = max_pool_2d(convnet, 2, strides=2, name='maxpool2')

convnet = conv_2d(convnet, 256, 3, activation='relu', scope='conv3_1')
convnet = conv_2d(convnet, 256, 3, activation='relu', scope='conv3_2')
convnet = conv_2d(convnet, 256, 3, activation='relu', scope='conv3_3')
convnet = max_pool_2d(convnet, 2, strides=2, name='maxpool3')

convnet = conv_2d(convnet, 512, 3, activation='relu', scope='conv4_1')
convnet = conv_2d(convnet, 512, 3, activation='relu', scope='conv4_2')
convnet = conv_2d(convnet, 512, 3, activation='relu', scope='conv4_3')
convnet = max_pool_2d(convnet, 2, strides=2, name='maxpool4')

convnet = conv_2d(convnet, 512, 3, activation='relu', scope='conv5_1')
convnet = conv_2d(convnet, 512, 3, activation='relu', scope='conv5_2')
convnet = conv_2d(convnet, 512, 3, activation='relu', scope='conv5_3')
convnet = max_pool_2d(convnet, 2, strides=2, name='maxpool5')

convnet = fully_connected(convnet, 4096, activation='relu', scope='fc6')
convnet = merge([convnet, weathernet], 'concat')
convnet = dropout(convnet, .75, name='dropout1')

convnet = fully_connected(convnet, 4096, activation='relu', scope='fc7')
convnet = dropout(convnet, .75, name='dropout2')

convnet = fully_connected(convnet, 1, activation='sigmoid', scope='fc8')

convnet = regression(convnet, 
                     optimizer='adam', 
                     learning_rate=learning_rate, 
                     loss='mean_square', 
                     name='targets')

model = tflearn.DNN(convnet, 
                    tensorboard_dir='log', 
                    tensorboard_verbose=0)

model.fit({
            'hive': x_train,
            'weather': weather_train  
          },
          {'targets': y_train}, 
          n_epoch=1000, 
          batch_size=batch_size,
          validation_set=({
              'hive': x_val,
              'weather': weather_val
          }, 
                          {'targets': y_val}), 
          show_metric=False, 
          shuffle=True,
          run_id='poop')

了解我的对象是什么

  • x_train是形状为(n, 64, 64, 1)的ndarray
  • weather_train是形状为(n, 4)的ndarray
  • y_train是形状为(n, 1)的ndarray

过拟合是另一个问题,但是鉴于模型在训练集上的表现不佳,我想以后再担心。

1 个答案:

答案 0 :(得分:0)

要解决您对测试集中所有实例的相同预测值的担忧。您在这里有几个选择,这些选择不涉及更改转换网络的结构:

  1. 您可以使用sklearn StandardScaler()(通过去除均值并将其缩放为单位方差来标准化功能)来重新缩放目标变量
  2. 缩放像素数据;通常,性能随着比例缩放的像素数据而提高,因为经验法则总是将像素数据除以255.0(如文章末尾所示)
  3. 您可以体验学习率和误差函数(CNN对于所有预测输出相同值的原因是因为它确定的是最小误差点)

下一步。如果您要执行回归,请确保最终的完全连接层使用 linear 激活功能而不是乙状结肠。线性激活函数将神经元的输入乘以神经元的权重,然后创建与输入成比例的输出。

convnet = fully_connected(convnet, 1, activation='linear', scope='fc8')

最后。我最近在Keras中实现了ResNet50用于回归任务。这是该网络的构造,该版本不允许加载预训练的砝码,并且必须接收形状为(224,224,3)的图像。

from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D, MaxPooling2D, DepthwiseConv2D
from keras.layers.core import Activation, Dropout, Dense
from keras.layers import Flatten, Input, Add, ZeroPadding2D, GlobalAveragePooling2D, GlobalMaxPooling2D
from keras.models import Model
from keras import backend


def block1(x, filters, kernel_size=3, stride=1, conv_shortcut=True, name=None):
    """
    A residual block

    :param x: input tensor
    :param filters: integer, filters of the bottleneck layer
    :param kernel_size: kernel size of bottleneck
    :param stride: stride of first layer
    :param conv_shortcut: use convolution shortcut if true, otherwise identity shortcut
    :param name: string, block label
    :return: Output tensor of the residual block

    """

    # bn_axis = 3 if backend.image_data_format() == 'channels_last' else 1

    bn_axis = -1

    if conv_shortcut is True:
        shortcut = Conv2D(4 * filters, 1, strides=stride, name=name+'_0_conv')(x)
        shortcut = BatchNormalization(axis=bn_axis, epsilon=1.001e-5, name=name+'_0_bn')(shortcut)
    else:
        shortcut = x

    x = Conv2D(filters, 1, strides=stride, name=name+'_1_conv')(x)
    x = BatchNormalization(axis=bn_axis, epsilon=1.001e-5, name=name+'_1_bn')(x)
    x = Activation('relu', name=name+'_1_relu')(x)

    x = Conv2D(filters, kernel_size, padding='SAME', name=name+'_2_conv')(x)
    x = BatchNormalization(axis=bn_axis, epsilon=1.001e-5, name=name+'_2_bn')(x)
    x = Activation('relu', name=name+'_2_relu')(x)

    x = Conv2D(4 * filters, 1, name=name+'_3_conv')(x)
    x = BatchNormalization(axis=bn_axis, epsilon=1.001e-5, name=name+'_3_bn')(x)

    x = Add(name=name+'_add')([shortcut, x])
    x = Activation('relu', name=name+'_out')(x)

    return x


def stack1(x, filters, blocks, stride1=2, name=None):
    """
    a set of stacked residual blocks

    :param x: input tensor
    :param filters: int, filters fof the bottleneck layer in the block
    :param blocks: int, blocks in the stacked blocks,
    :param stride1: stride of the first layer in the first block
    :param name: stack label
    :return: output tensor for the stacked blocks

    """

    x = block1(x, filters, stride=stride1, name=name+'_block1')

    for i in range(2, blocks+1):
        x = block1(x, filters, conv_shortcut=False, name=name+'_block'+str(i))

    return x

def resnet(height, width, depth, stack_fn, use_bias=False, nodes=256):
    """
    :param height: height of image, int
    :param width: image width, int
    :param depth: bn_axis or depth, int
    :param stack_fn: function that stacks residual blocks
    :param nodes: width of nodes included in top layer of CNN, int
    :return: a Keras model instance
    """

    input_shape = (height, width, depth)

    img_input = Input(shape=input_shape)

    x = ZeroPadding2D(padding=((3, 3), (3, 3)), name='conv1_pad')(img_input)
    x = Conv2D(64, 7, strides=2, use_bias=use_bias, name='conv1_conv')(x)

    x = ZeroPadding2D(padding=((1, 1), (1, 1)), name='pool1_pad')(x)
    x = MaxPooling2D(3, strides=2, name='pool1_pool')(x)

    x = stack_fn(x)

    # top layer
    x = GlobalAveragePooling2D(name='avg_pool')(x)
    x = Dense(nodes, activation='relu')(x)

    # perform regression
    x = Dense(1, activation='linear')(x)

    model = Model(img_input, x)

    return model


def resnet50(height, width, depth, nodes):

    def stack_fn(x):
        x = stack1(x, 64, 3, stride1=1, name='conv2')
        x = stack1(x, 128, 4, name='conv3')
        x = stack1(x, 256, 6, name='conv4')
        x = stack1(x, 512, 3, name='conv5')
        return x

    return resnet(height, width, depth, stack_fn, nodes=nodes)

可以使用某些x_train,x_test,y_train,y_test数据(其中x_train / test是图像数据,y_train,y_test数据是间隔[0,1]上的数值)来实现。

scaler = MinMaxScaler()
images = load_images(df=target, path=PATH_features, resize_shape=(224, 224), quadruple=True)
images = images / 255.0  # scale pixel data to [0, 1]
images = images.astype(np.float32)
imshape = images.shape

target = target[Target]
target = quadruple_target(target, target=Target)

x_train, x_test, y_train, y_test = train_test_split(images, target, test_size=0.3, random_state=101)

y_train = scaler.fit_transform(y_train)
y_test = scaler.transform(y_test)

model = resnet50(imshape[1], imshape[2], imshape[3], nodes=256)

opt = Adam(lr=1e-5, decay=1e-5 / 200)
model.compile(loss=lossFN, optimizer=opt)

history = model.fit(x_train, y_train, validation_data=(x_test, y_test), verbose=1, epochs=200)

pred = model.predict(x_test)