Question

我正在实施深度神经网络，并使用基于受限制的boltzmann机器的预训练算法初始化权重。但是，当我增加隐藏层的数量时，性能也会降低（例如从43％降低到41％）。

我有大约26K样本用于预训练，我的输入要素尺寸为98.我尝试了几种架构，每层隐藏节点数不同（10,50,100），隐藏1和2层。

我研究了文献，添加图层时性能下降的唯一原因是初始化不良。但是，由于我正在进行预训练，因此不适用于此。

您认为性能下降的原因是什么，这与我进行预训练的方式或数据量不足有关？如果你能提供一些科学论文作为参考，那就太棒了。

你建议我做些什么来解决这个问题？

Answer 1

最有可能与预训练相关联，因为这是允许您首先训练多个层的机制。我还不确定您的 training 算法到底是什么？你说你的预训练是基于RBM的，但只是为了确定，你的网络是一个深层信仰网络（DBN）？

如果是这样，那么你可能做错的事情很多，但我强烈建议随着时间的推移观察各层的梯度。如果它们在你的深度学习方法中腐烂或爆炸是不起作用的。我还尝试处理更简单的数据，以确认您可以成功地学习简单的函数，如XOR，sin和多层的喜欢，以排除数据作为错误的来源。

最后，值得注意的是，它不是一个真正的经验法则，更多层次=更好的表现＆＃34; （特别是对于DBM，see here），实际上具有一个较大层的多层感知器可能表现更好（部分与universal approximation theorem相关）

Answer 2

我已经为使用张量流的MNIST测试添加了更多层。但是我得到了非常糟糕的结果。因此，更多层神经网络意味着更好的预测或更高的准确性是不正确的。以下是我在tensorflow上的MNIST示例的测试代码：

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import argparse
import sys

from tensorflow.examples.tutorials.mnist import input_data

import tensorflow as tf



# Import data
data_dir='/tmp/tensorflow/mnist/input_data'
mnist = input_data.read_data_sets(data_dir, one_hot=True)

# Create the model
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 784*2]))
b = tf.Variable(tf.zeros([784*2]))

x2= tf.matmul(x, W)+b

#reluX= tf.nn.relu(x2)

W2 = tf.Variable(tf.zeros([784*2, 10]))
b2 = tf.Variable(tf.zeros([10]))

#y = tf.matmul(reluX, W2) + b2
y = tf.matmul(x2, W2) + b2

# Define loss and optimizer
y_ = tf.placeholder(tf.float32, [None, 10])

# The raw formulation of cross-entropy,
#
#   tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.nn.softmax(y)),
#                                 reduction_indices=[1]))
#
# can be numerically unstable.
#
# So here we use tf.nn.softmax_cross_entropy_with_logits on the raw
# outputs of 'y', and then average across the batch.
cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
#train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
train_step = tf.train.AdamOptimizer(0.0005).minimize(cross_entropy)

sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
# Train
for _ in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(1000)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
# Test trained model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images,
                                      y_: mnist.test.labels}))


correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.train.images,
                                      y_: mnist.train.labels}))

Answer 3

这应该是渐变消失的结果。您添加到隐藏图层的次数越多，更改的重要性就越小

在深度神经网络中添加隐藏层并不能提高性能

3 个答案: