所以我尝试了TensorFlow的急切执行,但是我的实现没有成功。我使用了gradient.tape
,并且在程序运行时,任何权重都没有可见的更新。我已经看过一些使用optimizer.apply_gradients()
来更新所有变量的示例算法和教程,但是我假设我没有正确使用它。
import tensorflow as tf
import tensorflow.contrib.eager as tfe
# emabling eager execution
tf.enable_eager_execution()
# establishing hyperparameters
LEARNING_RATE = 20
TRAINING_ITERATIONS = 3
# establishing all LABLES
LABELS = tf.constant(tf.random_normal([3, 1]))
# print(LABELS)
# stub statment for input
init = tf.Variable(tf.random_normal([3, 1]))
# declare and intialize all weights
weight1 = tfe.Variable(tf.random_normal([2, 3]))
bias1 = tfe.Variable(tf.random_normal([2, 1]))
weight2 = tfe.Variable(tf.random_normal([3, 2]))
bias2 = tfe.Variable(tf.random_normal([3, 1]))
weight3 = tfe.Variable(tf.random_normal([2, 3]))
bias3 = tfe.Variable(tf.random_normal([2, 1]))
weight4 = tfe.Variable(tf.random_normal([3, 2]))
bias4 = tfe.Variable(tf.random_normal([3, 1]))
weight5 = tfe.Variable(tf.random_normal([3, 3]))
bias5 = tfe.Variable(tf.random_normal([3, 1]))
VARIABLES = [weight1, bias1, weight2, bias2, weight3, bias3, weight4, bias4, weight5, bias5]
def thanouseEyes(input): # nn model aka: Thanouse's Eyes
layerResult = tf.nn.relu(tf.matmul(weight1, input) + bias1)
input = layerResult
layerResult = tf.nn.relu(tf.matmul(weight2, input) + bias2)
input = layerResult
layerResult = tf.nn.relu(tf.matmul(weight3, input) + bias3)
input = layerResult
layerResult = tf.nn.relu(tf.matmul(weight4, input) + bias4)
input = layerResult
layerResult = tf.nn.softmax(tf.matmul(weight5, input) + bias5)
return layerResult
# Begin training and update variables
optimizer = tf.train.AdamOptimizer(LEARNING_RATE)
with tf.GradientTape(persistent=True) as tape: # gradient calculation
for i in range(TRAINING_ITERATIONS):
COST = tf.reduce_sum(LABELS - thanouseEyes(init))
GRADIENTS = tape.gradient(COST, VARIABLES)
optimizer.apply_gradients(zip(GRADIENTS, VARIABLES))
print(weight1)
答案 0 :(得分:0)
使用optimizer
看起来不错,但是thanouseEyes()
定义的计算将始终返回[1.,1.,1.],而与变量无关,因此梯度始终为0,因此变量将永远不会更新(print(thanouseEyes(init))
和print(GRADIENTS)
应该证明这一点)。
再深入一点,将tf.nn.softmax
应用于形状为[3,1]的x = tf.matmul(weight5, input) + bias5
。因此tf.nn.softmax(x)
有效地计算了[softmax(x[0]), softmax(x[1]), softmax(x[2])]
,因为tf.nn.softmax
(默认情况下)应用于输入的最后一个轴。 x[0]
,x[1]
和x[2]
是具有一个元素的向量,因此softmax(x[i])
始终为1.0。
希望有帮助。
您可能对以下与您的问题无关的其他要点:
从TensorFlow 1.11开始,您的程序中不需要tf.contrib.eager
模块。将所有出现的tfe
替换为tf
(即用tf.Variable
代替tfe.Variable
),您将获得相同的结果
GradientTape
的上下文中执行的计算被“记录”,即,它保持中间张量,以便以后可以计算梯度。长话短说,您想将GradientTape
移动到循环主体中:
-
for i in range(TRAINING_ITERATIONS):
with tf.GradientTape() as tape:
COST = tf.reduce_sum(LABELS - thanouseEyes(init))
GRADIENTS = tape.gradient(COST, VARIABLES)
optimizer.apply_gradients(zip(GRADIENTS, VARIABLES))