我已经在TensorFlow中使用python构建了一个神经网络,但是我似乎无法通过TensorFlow的热切执行来解决此问题。所有的渐变输出为零,我不确定在程序中哪里出错了。
最初我使用的是ReLU,我认为这是网络中的问题,因此我将其更改为泄漏的ReLU。但是尚未看到渐变的任何变化。
import tensorflow as tf
# emabling eager execution
tf.enable_eager_execution()
# establising learning rate
LEARNING_RATE = 20
TRAINING_ITERATIONS = 30
LABELS = tf.constant([0.5, 0.7, 1.0])
# print(LABELS)
# input test vector
init = tf.Variable(tf.random_normal([3, 1]))
# print(init)
# declare and intialize all weights
weight1 = tf.Variable(tf.random_normal([2, 3]))
bias1 = tf.Variable(tf.random_normal([2, 1]))
weight2 = tf.Variable(tf.random_normal([3, 2]))
bias2 = tf.Variable(tf.random_normal([3, 1]))
weight3 = tf.Variable(tf.random_normal([2, 3]))
bias3 = tf.Variable(tf.random_normal([2, 1]))
weight4 = tf.Variable(tf.random_normal([3, 2]))
bias4 = tf.Variable(tf.random_normal([3, 1]))
weight5 = tf.Variable(tf.random_normal([3, 3]))
bias5 = tf.Variable(tf.random_normal([3, 1]))
VARIABLES = [weight1, bias1, weight2, bias2, weight3, bias3, weight4, bias4, weight5, bias5]
# print(weight1)
def neuralNet(input, y_input): # nn model aka: Thanouse's Eyes
layerResult = tf.nn.leaky_relu((tf.matmul(weight1, input) + bias1), alpha=0.1)
input = layerResult
layerResult = tf.nn.leaky_relu((tf.matmul(weight2, input) + bias2), alpha=0.1)
input = layerResult
layerResult = tf.nn.leaky_relu((tf.matmul(weight3, input) + bias3), alpha=0.1)
input = layerResult
layerResult = tf.nn.leaky_relu((tf.matmul(weight4, input) + bias4), alpha=0.1)
input = layerResult
layerResult = tf.nn.leaky_relu((tf.matmul(weight5, input) + bias5), alpha=0.1)
prediction = tf.nn.softmax(tf.reshape(layerResult, [-1]))
return prediction
# print(neuralNet(init, LABELS))
# Begin training and update variables
optimizer = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE)
for i in range(TRAINING_ITERATIONS):
with tf.GradientTape(persistent=True) as tape: # gradient calculation
tape.watch(VARIABLES)
COST = tf.reduce_sum(LABELS - neuralNet(init, LABELS))
print(COST)
GRADIENTS = tape.gradient(COST, VARIABLES)
# print(GRADIENTS)
optimizer.apply_gradients(zip(GRADIENTS, VARIABLES))
答案 0 :(得分:0)
您不需要persistent
GradientTape。只需删除参数即可。
实际的问题是sum(softmax)
的导数始终始终为零,因为根据定义,softmax输出的总和始终为1。因此,无论您对变量进行什么操作,都无法减少您定义的费用。