完全连接层的卷积神经网络被LSTM取代

时间:2017-04-10 05:21:23

标签: tensorflow deep-learning conv-neural-network lstm

我正在尝试为棒球建立一个神经网络,可以检测球的位置以及当球场穿过板时击球区的位置,但是我的神经网络似乎卡在局部最小值中,它返回相同的值对于数据集中的每个项目。

我的方法是在球接近盘子时采取34帧,并用它来检测球何时穿过盘子。

输出为ball_left,ball_top,ball_width,strike_zone_left,strike_zone_top,strike_zone_width,strike_zone_height,frame_where_ball_crossed_plate。

我的神经网络模型是在每个帧上进行卷积,但最后不是完全连接的层,我使用LSTM,以便神经网络可以从之前的帧推断出事物。我需要从之前的帧中推断,因为有时球不会被看到,因为投手在它前面或因为它在捕手手套中。

观察一段时间内的成本,神经网络似乎停留在局部最小值上,并为训练集中的每个音高产生相同的结果。

这是我的代码。

    filter_size1 = 5  # Convolution filters are 5 x 5 pixels.
    num_filters1 = 16  # There are 16 of these filters.

    filter_size2 = 5  # Convolution filters are 5 x 5 pixels.
    num_filters2 = 36  # There are 36 of these filters.

    filter_size3 = 5  # Convolution filters are 5 x 5 pixels.
    num_filters3 = 36  # There are 36 of these filters.

    num_hidden = 256
    lstm_layers = 2

    num_channels = 1

    num_classes = 10

    width = 320
    height = 180

    sequence_length = Directories.Pitch_Sequence_Length

    x = tf.placeholder(tf.float32, shape=[None, sequence_length, width, height], name='x')

    keep_prob = tf.placeholder(tf.float32, name="keep_prob")

    x_image = tf.reshape(x, [-1, width, height, num_channels])

    y_true = tf.placeholder(tf.float32, shape=[None, num_classes], name='y_true')

    layer_conv1, weights_conv1, biases_conv1 = ConvolutionNeuralNetwork.new_conv_layer(input=x_image,
                                                                         num_input_channels=num_channels,
                                                                         filter_size=filter_size1,
                                                                         num_filters=num_filters1,
                                                                         use_pooling=True)

    layer_conv2, weights_conv2, biases_conv2 = ConvolutionNeuralNetwork.new_conv_layer(input=layer_conv1, num_input_channels=num_filters1,
                                                filter_size=filter_size2, num_filters=num_filters2,
                                                use_pooling=True)

    layer_conv3, weights_conv3, biases_conv3 = ConvolutionNeuralNetwork.new_conv_layer(input=layer_conv2, num_input_channels=num_filters2,
                                                filter_size=filter_size3, num_filters=num_filters3,
                                                use_pooling=True)

    layer_flat, num_features = ConvolutionNeuralNetwork.flatten_layer(layer_conv3)

    fc_sequence = tf.reshape(layer_flat, [-1, sequence_length, int(layer_flat.shape[1])])

    cell = tf.contrib.rnn.BasicLSTMCell(num_hidden)

    cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=keep_prob)

    cell = tf.contrib.rnn.MultiRNNCell([cell] * lstm_layers)

    outputs, states = tf.contrib.rnn.static_rnn(cell, tf.unstack(tf.transpose(fc_sequence, perm=[1, 0, 2])), dtype=tf.float32)

    self.y_pred = ConvolutionNeuralNetwork.new_fc_layer(outputs[-1], num_hidden, num_classes, use_relu=True)

    self.cost = tf.reduce_mean(tf.pow(self.y_pred - y_true, 2))

    self.optimizer = tf.train.AdamOptimizer(learning_rate=1e-4).minimize(self.cost)

和类ConvolutionNeuralNetwork.py

    import tensorflow as tf

    def new_weights(shape):
        return tf.Variable(tf.truncated_normal(shape, stddev=0.05))


    def new_biases(length):
        return tf.Variable(tf.constant(0.05, shape=[length]))


    def new_conv_layer(input, num_input_channels, filter_size, num_filters, use_pooling=True, weights=None, biases=None):
        shape = [filter_size, filter_size, num_input_channels, num_filters]

        if weights is None:
            weights = new_weights(shape=shape)

        if biases is None:
            biases = new_biases(length=num_filters)

        layer = tf.nn.conv2d(input=input, filter=weights, strides=[1, 1, 1, 1], padding='SAME')

        layer += biases

        if use_pooling:
            layer = tf.nn.max_pool(value=layer, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

        layer = tf.nn.relu(layer)

        return layer, weights, biases

    def flatten_layer(layer):
        layer_shape = layer.get_shape()

        num_features = layer_shape[1:4].num_elements()

        layer_flat = tf.reshape(layer, [-1, num_features])

        return layer_flat, num_features


    def flatten_layer_multiple(layer1, layer2):
        layer = tf.concat([layer1, layer2], 1)
        layer_shape = layer.get_shape()

        num_features = layer_shape[1:4].num_elements()

        layer_flat = tf.reshape(layer, [-1, num_features])

        return layer_flat, num_features


    def new_fc_layer(input, num_inputs, num_outputs, use_relu=True, keep_prob=None):
        weights = new_weights(shape=[num_inputs, num_outputs])
        biases = new_biases(length=num_outputs)

        layer = tf.matmul(input, weights) + biases

        if use_relu:
            layer = tf.nn.relu(layer)
            if keep_prob is not None:
                tf.nn.dropout(layer, keep_prob)

        return layer

以下是我的学习曲线(成本随时间变化)

enter image description here

以下是结果假设的样本。

enter image description here

这就是神经网络所预测的。

enter image description here

神经网络为攻击区域绘制该框,并且该框用于在完全相同的位置中每个球场的球的位置。谁能看到我做错了什么?

0 个答案:

没有答案