我使用输入X(24,1),输出Y(6,1)和变量w,b进行梯度下降优化,但是在第一次迭代中成本变为NaN,甚至学习率为1e-20。我检查了w的渐变值,并在第一次迭代中变为全0。
Tensorflow如何区分梯度下降优化器?我怎么能弄清楚这个问题呢?
X = tf.placeholder(tf.float32,[1,n_func],name="X")
Y = tf.placeholder(tf.float32,[1,n_output], name="Y")
n_hl1 = n_output
# (24*6)
hidden_layer_1 = {'w': tf.Variable(tf.random_normal([n_func, n_hl1]), name='h1w'),
'b': tf.Variable(tf.random_normal([n_hl1]), name='h1b')}
# (1*24) (24*6) = (1*6)
l1 = tf.add(tf.matmul(X,hidden_layer_1['w']), hidden_layer_1['b'])
l1 = tf.nn.relu(l1)
prediction = l1
# Cost
cost = tf.reduce_mean(tf.square(Y - prediction, name="cost"))
# rate
global_step = tf.Variable(0, trainable=False)
starter_learning_rate = 1.0e-20
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step, epochs/10, 0.8, staircase=True)
# GradientDescentOptimizer
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost, global_step=global_step)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for _ in range(epochs):
cost_value = 0
learning_step = optimizer
for i in range(n_test):
c = 0
_,c = sess.run([optimizer, cost],{X:h2[i],Y:[y[:,i]]})
cost_value += c
print cost_value