隐藏图层的添加使输出收敛于一个值 - Tensorflow

时间:2017-10-19 17:26:59

标签: python machine-learning tensorflow neural-network backpropagation

我开始使用简单的线性回归式网络,用Tensorflow编写,主要基于他们的MNIST初学者教程。有7个输入变量和1个输出变量,所有这些都是连续的。使用这个模型,输出都在1左右徘徊,这是有道理的,因为目标输出集主要由值1控制。这是测试数据生成的输出样本:

[ 0.95340264]
[ 0.94097006]
[ 0.96644485]
[ 0.95954728]
[ 0.93524933]
[ 0.94564033]
[ 0.94379318]
[ 0.92746377]
[ 0.94073343]
[ 0.98421943]

然而,准确度从未达到约84%,因此我决定添加一个隐藏层。现在输出完全收敛于单个值,例如:

[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]
[ 0.96631247]

并且准确度保持在82-84%之间。当在目标输出为1的多次训练过程中检查获得的y值,目标y值和来自单行数据的交叉熵时,所获得的y值逐渐接近1:

[ 0.]
[ 1.]
0.843537

[ 0.03999992]
[ 1.]
0.803543

[ 0.07999983]
[ 1.]
0.763534

[ 0.11999975]
[ 1.]
0.723541

[ 0.15999967]
[ 1.]
0.683544

然后在到达目标后徘徊在1左右:

[ 0.99136335]
[ 1.]
0.15912

[ 1.00366712]
[ 1.]
0.16013

[ 0.96366721]
[ 1.]
0.167638

[ 0.97597092]
[ 1.]
0.163856

[ 0.98827463]
[ 1.]
0.160069

然而,当目标y值为0.5时,它表现为目标为1,接近0.5然后超调:

[ 0.47648361]
[ 0.5]
0.378556

[ 0.51296818]
[ 0.5]
0.350674

[ 0.53279752]
[ 0.5]
0.340844

[ 0.55262685]
[ 0.5]
0.331016

[ 0.57245618]
[ 0.5]
0.321187
当交叉熵实际上达到目标时,交叉熵继续减小:

[ 0.94733644]
[ 0.5]
0.168714

[ 0.96027154]
[ 0.5]
0.164533

[ 0.97320664]
[ 0.5]
0.16035

[ 0.98614174]
[ 0.5]
0.156166

[ 0.99907684]
[ 0.5]
0.151983

打印出测试数据的获得值,目标值和目标距离,无论目标y如何,都显示相同的y:

5
[ 0.98564607]
[ 0.5]
[ 0.48564607]
6
[ 0.98564607]
[ 0.60000002]
[ 0.38564605]
7
[ 0.98564607]
[ 1.]
[ 0.01435393]
8
[ 0.98564607]
[ 1.]
[ 0.01435393]
9
[ 0.98564607]
[ 1.]
[ 0.01435393]

代码如下。 a)为什么在训练部分期间,算法将目标y值视为始终为1,b)为什么在测试部分期间产生相同的输出?即使它"认为"目标始终为1,测试输出应至少有一些变化,如训练输出中所示。

import argparse
import dataset
import numpy as np
import os
import sys
import tensorflow as tf

FLAGS = None

def main(_):
    num_fields = 7
    batch_size = 100
    rating_field = 7
    outputs = 1
    hidden_units = 7

    train_data = dataset.Dataset("REPPED_RATING_TRAINING.txt", "    ", num_fields, rating_field)
    td_len = len(train_data.data)
    test_data = dataset.Dataset("REPPED_RATING_TEST.txt", " ", num_fields, rating_field)
    test_len = len(test_data.data)
    test_input = test_data.data[:, :num_fields].reshape(test_len, num_fields)
    test_target = test_data.fulldata[:, rating_field ].reshape(test_len, 1)

    graph = tf.Graph()
    with graph.as_default():
            x = tf.placeholder(tf.float32, [None, num_fields], name="x")
            W1 = tf.Variable(tf.zeros([num_fields, hidden_units]))
            b1 = tf.Variable(tf.zeros([hidden_units]))
            W2 = tf.Variable(tf.zeros([hidden_units, outputs]))
            b2 = tf.Variable(tf.zeros([outputs]))
            H = tf.add(tf.matmul(x, W1), b1, name="H")
            y = tf.add(tf.matmul(H, W2), b2, name="y")
            y_ = tf.placeholder(tf.float32, [None, outputs])
            yd = tf.abs(y_ - y)
            cross_entropy = tf.reduce_mean(yd)
            train_step = tf.train.GradientDescentOptimizer(0.04).minimize(cross_entropy)
            init = tf.global_variables_initializer()
            saver = tf.train.Saver()

    with tf.Session(graph=graph) as sess:
            sess.run(init)

            train_input, train_target = train_data.batch(td_len)
            for _ in range(FLAGS.times):
                    ts, yo, yt, ce = sess.run([train_step, y, y_, cross_entropy], feed_dict={x: train_input, y_:train_target})
                    #print obtained y, target y, and cross entropy from a given row over 10 training instances
                    print(yo[3])
                    print(yt[3])
                    print(ce)
                    print()

            checkpoint_file = os.path.join(FLAGS.model_dir, 'saved-checkpoint')
            print("\nWriting checkpoint file: " + checkpoint_file)
            saver.save(sess, checkpoint_file)

            test_input, test_target = test_data.batch(test_len)
            ty, ty_, tce, tyd = sess.run(
                    [y, y_, cross_entropy, yd],
                    feed_dict={x : test_input, y_: test_target})
            #print obtained y, target y, and distance to target for 10 random test rows
            for ix in range(10):
                    print(ix)
                    print(ty[ix])
                    print(ty_[ix])
                    print(tyd[ix])

            print()
            print('Ran times: ' + str(FLAGS.times))
            print('Acc: ' + str(1-tce))

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--times', type=int, default=100,
                    help='Number of passes to train')
    parser.add_argument('--model_dir', type=str,
            default=os.path.join('.', 'tmp'),
            help='Directory for storing model info')
    FLAGS, unparsed = parser.parse_known_args()
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

1 个答案:

答案 0 :(得分:1)

您的代码中存在多个问题,其中许多问题可能导致网络无法正常训练:

  • 您正在使用零值初始化权重和偏差。它应该用一个小的随机值(统一或正态分布)初始化。
  • 您的网络中没有激活功能,因此只能模拟线性关系。
  • 学习率是固定的,这是一个你必须调整的超参数。您还必须在训练期间监视损失函数的值,以确保它正在减小然后收敛到一个较小的值。如果不是那么你应该查看输出,因为网络没有学习任何东西。

此外,如果您没有规范化输入和输出,您也应该这样做。