Question

我正在训练一个分类器，它接受一个RGB输入（所以三个0到255的值）并返回黑色或白色（0或1）字体是否最适合该颜色。训练之后，我的分类器总是返回0.5（或者那里）并且永远不会比那更准确。

代码如下：

import tensorflow as tf
import numpy as np
from tqdm import tqdm

print('Creating Datasets:')

x_train = []
y_train = []

for i in tqdm(range(10000)):
    x_train.append([np.random.uniform(0, 255), np.random.uniform(0, 255), np.random.uniform(0, 255)])

for elem in tqdm(x_train):
    if (((elem[0] + elem[1] + elem[2]) / 3) / 255) > 0.5:
        y_train.append(0)
    else:
        y_train.append(1)

x_train = np.array(x_train)
y_train = np.array(y_train)

graph = tf.Graph()

with graph.as_default():

    x = tf.placeholder(tf.float32)
    y = tf.placeholder(tf.float32)

    w_1 = tf.Variable(tf.random_normal([3, 10], stddev=1.0), tf.float32)
    b_1 = tf.Variable(tf.random_normal([10]), tf.float32)
    l_1 = tf.sigmoid(tf.matmul(x, w_1) + b_1)

    w_2 = tf.Variable(tf.random_normal([10, 10], stddev=1.0), tf.float32)
    b_2 = tf.Variable(tf.random_normal([10]), tf.float32)
    l_2 = tf.sigmoid(tf.matmul(l_1, w_2) + b_2)

    w_3 = tf.Variable(tf.random_normal([10, 5], stddev=1.0), tf.float32)
    b_3 = tf.Variable(tf.random_normal([5]), tf.float32)
    l_3 = tf.sigmoid(tf.matmul(l_2, w_3) + b_3)

    w_4 = tf.Variable(tf.random_normal([5, 1], stddev=1.0), tf.float32)
    b_4 = tf.Variable(tf.random_normal([1]), tf.float32)
    y_ = tf.sigmoid(tf.matmul(l_3, w_4) + b_4)

    loss = tf.reduce_mean(tf.squared_difference(y, y_))

    optimizer = tf.train.AdadeltaOptimizer().minimize(loss)

    with tf.Session() as sess:

        sess.run(tf.global_variables_initializer())

        print('Training:')

        for step in tqdm(range(5000)):
            index = np.random.randint(0, len(x_train) - 129)
            feed_dict = {x : x_train[index:index+128], y : y_train[index:index+128]}
            sess.run(optimizer, feed_dict=feed_dict)
            if step % 1000 == 0:
                print(sess.run([loss], feed_dict=feed_dict))

        while True:
            inp1 = int(input(''))
            inp2 = int(input(''))
            inp3 = int(input(''))
            print(sess.run(y_, feed_dict={x : [[inp1, inp2, inp3]]}))

如您所见，我首先导入我将使用的模块。接下来，我生成输入x数据集和所需的输出y数据集。 x_train数据集由10000个随机RGB值组成，而y_train数据集由0和1和1组成，其中1对应于RGB值，其平均值低于128，0对应于RGB值平均值高于128（这可确保明亮的背景变为暗色，反之亦然）。

我的神经网络确实过于复杂（或者我认为），但据我所知，它是一个非常标准的前馈网络，具有Adadelta优化器和默认学习速率。

只要我的有限知识告诉我，网的训练是正常的，但是模型总是吐出0.5。

最后一段代码允许用户输入值并查看它们在传递给神经网络时会变成什么。

我已经搞砸了不同的激活功能，损失，初始化偏差的方法等。但无济于事。有时当我修改代码时，模型总是分别返回1或0，但这仍然是犹豫不决并且一遍又一遍地返回0.5。我无法在线找到合适的问题解决方案。欢迎任何建议或意见。

编辑：

在整个训练过程中，损失，重量，偏差和输出都没有太大变化（权重和偏差每1000次迭代只变化百分之一千分之一，而损失在0.3左右波动）。此外，输出有时会根据输入而变化f（如您所料），但其他时间是不变的。程序的一次运行导致常量0.7＆s为输出，而另一次运行导致从非常接近零返回0.5，其中它返回0.3或0.4类型值。上述两者都不是期望的输出。应该发生的是（255,255,255）应映射到0并且（0,0,0）应映射到1并且（128,128,128）应映射到1或0，如在字体中间颜色并不重要。

Answer 1

我从网络上看到的两件事：

隐藏层中的Sigmoid激活通常是一个糟糕的选择。对于大（正或负）输入，S形函数饱和，导致梯度随着通过网络反向传播而变得越来越小。这通常被称为＆＃34;消失梯度＆＃34;问题。可能是输出附近变量的梯度是健康的＆＃34;因此上层正在学习，但是如果下层没有接收到任何梯度，它们将简单地保持返回较高层无法使用的随机值。您可以尝试用例如sigmoid激活替换sigmoid激活。 tf.nn.relu。输出层中的Sigmoid是可以的（如果您希望输出为0/1，则有必要），但是请考虑使用交叉熵而不是平方误差作为损失函数。
您的体重初始化可能会导致体重过大。标准偏差1.0太高了。这可能导致数值问题以及使激活更加饱和（因为由于大的权重，您可以期望从一开始就具有大的激活值）。尝试使用类似0.1的std，并考虑使用truncated_normal来防止异常值（或使用统一的随机初始化）。

很难说这是否会解决您的问题，但我相信这些都是您现在应该对网络进行改变的事情。

Answer 2

最大的问题是您使用均方误差作为分类问题的损失函数。交叉熵损失函数更适合此类问题。

这是交叉熵损失函数和均方误差损失函数之间差异的可视化：

来源：Wolfram Alpha

请注意，随着模型距离正确的预测越来越远（在这种情况下为1），损耗会逐渐增加。该曲率在反向传播期间提供了更强的梯度信号，同时还满足了许多重要的理论概率分布距离（散度）特性。通过最小化交叉熵损失，您实际上还可以最小化模型的预测分布和训练数据标签分布之间的KL差异。您可以在这里阅读有关交叉熵损失函数的更多信息：http://colah.github.io/posts/2015-09-Visual-Information/

我还做了一些其他调整，以使代码更好，并使模型更易于修改。这应该可以解决您所有的问题：

import tensorflow as tf
import numpy as np
from tqdm import tqdm

# define a random seed for (somewhat) reproducible results:
seed = 0
np.random.seed(seed)
print('Creating Datasets:')

# much faster dataset creation
x_train = np.random.uniform(low=0, high=255, size=[10000, 3])
# easier label creation
# if the average color is greater than half the color space than use black, otherwise use white
# classes:
# white = 0
# black = 1
y_train = ((np.mean(x_train, axis=1) / 255.0) > 0.5).astype(int)

# now transform dataset to be within range [-1, 1] instead of [0, 255] 
# for numeric stability and quicker model training
x_train = (2 * (x_train / 255)) - 1

graph = tf.Graph()

with graph.as_default():
    # must do this within graph scope
    tf.set_random_seed(seed)
    # specify input dims for clarity
    x = tf.placeholder(tf.float32, shape=[None, 3])
    # y is now integer label [0 or 1]
    y = tf.placeholder(tf.int32, shape=[None])
    # use relu, usually better than sigmoid 
    activation_fn = tf.nn.relu
    # from https://arxiv.org/abs/1502.01852v1
    initializer = tf.initializers.variance_scaling(
        scale=2.0, 
        mode='fan_in',
        distribution='truncated_normal')
    # better api to reduce clutter
    l_1 = tf.layers.dense(
        x,
        10,
        activation=activation_fn,
        kernel_initializer=initializer)
    l_2 = tf.layers.dense(
        l_1,
        10,
        activation=activation_fn,
        kernel_initializer=initializer)
    l_3 = tf.layers.dense(
        l_2,
        5,
        activation=activation_fn,
        kernel_initializer=initializer)
    y_logits = tf.layers.dense(
        l_3,
        2,
        activation=None,
        kernel_initializer=initializer)

    y_ = tf.nn.softmax(y_logits)
    # much better loss function for classification
    loss = tf.reduce_mean(
        tf.losses.sparse_softmax_cross_entropy(
            labels=y, 
            logits=y_logits))
    # much better default optimizer for new problems
    # good learning rate, but probably can tune
    optimizer = tf.train.AdamOptimizer(
        learning_rate=0.01)
    # seperate train op for easier calling
    train_op = optimizer.minimize(loss)

    # tell tensorflow not to allocate all gpu memory at start
    config = tf.ConfigProto()
    config.gpu_options.allow_growth=True
    with tf.Session(config=config) as sess:

        sess.run(tf.global_variables_initializer())

        print('Training:')

        for step in tqdm(range(5000)):
            index = np.random.randint(0, len(x_train) - 129)
            feed_dict = {x : x_train[index:index+128], 
                         y : y_train[index:index+128]}
            # can train and get loss in single run, much more efficient
            _, b_loss = sess.run([train_op, loss], feed_dict=feed_dict)
            if step % 1000 == 0:
                print(b_loss)

        while True:
            inp1 = int(input('Enter R pixel color: '))
            inp2 = int(input('Enter G pixel color: '))
            inp3 = int(input('Enter B pixel color: '))
            # scale to model train range [-1, 1]
            model_input = (2 * (np.array([inp1, inp2, inp3], dtype=float) / 255.0)) - 1
            if (model_input >= -1).all() and (model_input <= 1).all():
                # y_ is now two probabilities (white_prob, black_prob) but they will sum to 1.
                white_prob, black_prob = sess.run(y_, feed_dict={x : [model_input]})[0]
                print('White prob: {:.2f} Black prob: {:.2f}'.format(white_prob, black_prob))
            else:
                print('Values not within [0, 255]!')

我用注释记录了我的更改，但是如果您有任何疑问，请告诉我！我在终端上运行了它，效果很好：

Creating Datasets:
2018-10-05 00:50:59.156822: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-10-05 00:50:59.411003: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1405] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:03:00.0
totalMemory: 8.00GiB freeMemory: 6.60GiB
2018-10-05 00:50:59.417736: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1484] Adding visible gpu devices: 0
2018-10-05 00:51:00.109351: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-10-05 00:51:00.113660: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:971]      0
2018-10-05 00:51:00.118545: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:984] 0:   N
2018-10-05 00:51:00.121605: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6370 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:03:00.0, compute capability: 6.1)
Training:
  0%|                                                                                         | 0/5000 [00:00<?, ?it/s]0.6222609
 19%|██████████████▋                                                               | 940/5000 [00:01<00:14, 275.57it/s]0.013466636
 39%|██████████████████████████████                                               | 1951/5000 [00:02<00:04, 708.07it/s]0.0067519126
 59%|█████████████████████████████████████████████▊                               | 2971/5000 [00:04<00:02, 733.24it/s]0.0028143923
 79%|████████████████████████████████████████████████████████████▌                | 3935/5000 [00:05<00:01, 726.36it/s]0.0073514087
100%|█████████████████████████████████████████████████████████████████████████████| 5000/5000 [00:07<00:00, 698.32it/s]
Enter R pixel color: 1
Enter G pixel color: 1
Enter B pixel color: 1
White prob: 1.00 Black prob: 0.00
Enter R pixel color: 255
Enter G pixel color: 255
Enter B pixel color: 255
White prob: 0.00 Black prob: 1.00
Enter R pixel color: 128
Enter G pixel color: 128
Enter B pixel color: 128
White prob: 0.08 Black prob: 0.92
Enter R pixel color: 126
Enter G pixel color: 126
Enter B pixel color: 126
White prob: 0.99 Black prob: 0.01

二进制分类器始终返回0.5

2 个答案: