急于执行时难以进行梯度下降

时间:2019-01-11 09:40:07

标签: tensorflow gradient-descent eager-execution

我已经在TensorFlow中使用python构建了一个神经网络,但是我似乎无法通过TensorFlow的热切执行来解决此问题。所有的渐变输出为零,我不确定在程序中哪里出错了。

最初我使用的是ReLU,我认为这是网络中的问题,因此我将其更改为泄漏的ReLU。但是尚未看到渐变的任何变化。

import tensorflow as tf

# emabling eager execution
tf.enable_eager_execution()

# establising learning rate
LEARNING_RATE = 20
TRAINING_ITERATIONS = 30
LABELS = tf.constant([0.5, 0.7, 1.0])
# print(LABELS)

# input test vector
init = tf.Variable(tf.random_normal([3, 1]))
# print(init)

# declare and intialize all weights
weight1 = tf.Variable(tf.random_normal([2, 3]))
bias1 = tf.Variable(tf.random_normal([2, 1]))
weight2 = tf.Variable(tf.random_normal([3, 2]))
bias2 = tf.Variable(tf.random_normal([3, 1]))
weight3 = tf.Variable(tf.random_normal([2, 3]))
bias3 = tf.Variable(tf.random_normal([2, 1]))
weight4 = tf.Variable(tf.random_normal([3, 2]))
bias4 = tf.Variable(tf.random_normal([3, 1]))
weight5 = tf.Variable(tf.random_normal([3, 3]))
bias5 = tf.Variable(tf.random_normal([3, 1]))

VARIABLES = [weight1, bias1, weight2, bias2, weight3, bias3, weight4, bias4, weight5, bias5]
# print(weight1)


def neuralNet(input, y_input):  # nn model aka: Thanouse's Eyes
    layerResult = tf.nn.leaky_relu((tf.matmul(weight1, input) + bias1), alpha=0.1)
    input = layerResult
    layerResult = tf.nn.leaky_relu((tf.matmul(weight2, input) + bias2), alpha=0.1)
    input = layerResult
    layerResult = tf.nn.leaky_relu((tf.matmul(weight3, input) + bias3), alpha=0.1)
    input = layerResult
    layerResult = tf.nn.leaky_relu((tf.matmul(weight4, input) + bias4), alpha=0.1)
    input = layerResult
    layerResult = tf.nn.leaky_relu((tf.matmul(weight5, input) + bias5), alpha=0.1)
    prediction = tf.nn.softmax(tf.reshape(layerResult, [-1]))
    return prediction


# print(neuralNet(init, LABELS))
# Begin training and update variables
optimizer = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE)

for i in range(TRAINING_ITERATIONS):
    with tf.GradientTape(persistent=True) as tape:  # gradient calculation
        tape.watch(VARIABLES)
        COST = tf.reduce_sum(LABELS - neuralNet(init, LABELS))
    print(COST)
    GRADIENTS = tape.gradient(COST, VARIABLES)
    # print(GRADIENTS)
    optimizer.apply_gradients(zip(GRADIENTS, VARIABLES))

1 个答案:

答案 0 :(得分:0)

您不需要persistent GradientTape。只需删除参数即可。

实际的问题是sum(softmax)的导数始终始终为零,因为根据定义,softmax输出的总和始终为1。因此,无论您对变量进行什么操作,都无法减少您定义的费用。