您认为性能下降的原因是什么,这与我进行预训练的方式或数据量不足有关? 如果你能提供一些科学论文作为参考,那就太棒了。
答案 0 :(得分:0)
最有可能与预训练相关联,因为这是允许您首先训练多个层的机制。我还不确定您的 training 算法到底是什么?你说你的预训练是基于RBM的,但只是为了确定,你的网络是一个深层信仰网络(DBN)?
最后,值得注意的是,它不是一个真正的经验法则,更多层次=更好的表现" (特别是对于DBM,see here),实际上具有一个较大层的多层感知器可能表现更好(部分与universal approximation theorem相关)
答案 1 :(得分:0)
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import sys
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
# Import data
mnist = input_data.read_data_sets(data_dir, one_hot=True)
# Create the model
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 784*2]))
b = tf.Variable(tf.zeros([784*2]))
x2= tf.matmul(x, W)+b
#reluX= tf.nn.relu(x2)
W2 = tf.Variable(tf.zeros([784*2, 10]))
b2 = tf.Variable(tf.zeros([10]))
#y = tf.matmul(reluX, W2) + b2
y = tf.matmul(x2, W2) + b2
# Define loss and optimizer
y_ = tf.placeholder(tf.float32, [None, 10])
# The raw formulation of cross-entropy,
# tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.nn.softmax(y)),
# reduction_indices=[1]))
# can be numerically unstable.
# So here we use tf.nn.softmax_cross_entropy_with_logits on the raw
# outputs of 'y', and then average across the batch.
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
#train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
train_step = tf.train.AdamOptimizer(0.0005).minimize(cross_entropy)
sess = tf.InteractiveSession()
# Train
for _ in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(1000)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
# Test trained model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images,
y_: mnist.test.labels}))
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.train.images,
y_: mnist.train.labels}))
答案 2 :(得分:0)
这应该是渐变消失的结果。 您添加到隐藏图层的次数越多,更改的重要性就越小