我正在使用tensorflow进行一些NN计算,该计算依赖于子过程调用的外部操作,我知道该子过程的解析导数。为此,我从一个玩具模型开始,在该模型中,我将自定义操作定义为numpy函数及其渐变。我按照https://gist.github.com/harpone/3453185b41d8d985356cbe5e57d67342中报道的示例进行操作,该示例适用于执行标量乘法的简单自定义操作,例如f(a,x)= a * x。我正在尝试为简单的3D-> 1D操作(例如f(A,x)= matmul(x,AT))编写等效代码,但是在反向传播过程中难以匹配输入变量的形状和渐变。 >
代码如下:
import tensorflow as tf
from tensorflow.python.framework import ops
import numpy as np
import time
ZERO_TOL = 1e-5
LOSS_TOL = 1e-4
SAMPLES = 100
EPOCHS = 100000
train_input = np.random.rand(SAMPLES,3)
W_TRUE = np.ones((1,3))
train_label = np.dot(train_input,W_TRUE.T) - 2.3 #shape=(SAMPLES,1)
class MyException(Exception):
pass
def _my_linear_grad(op, grad):
return op.inputs[1], 0., 1.
def my_linear(a, x, b):
return np.dot(x,a.T) + b
learning_rate = 1e-3
beta1 = 0.9999
x = tf.placeholder(dtype=tf.float32, shape=(None,3), name='x')
y = tf.placeholder(dtype=tf.float32, shape=(None,1), name='y')
a = tf.get_variable('a', dtype=tf.float32, initializer=(np.random.rand(1,3).astype(np.float32)))
tf_a = tf.get_variable('tf_a', dtype=tf.float32, initializer=(np.random.rand(1,3).astype(np.float32)))
b = tf.get_variable('b', dtype=tf.float32, initializer=0.)
tf_b = tf.get_variable('tf_b', dtype=tf.float32, initializer=0.)
with ops.op_scope([a, x, b], name="MyLinear") as name:
# custom gradient op name shouldn't conflict with any other TF op name
unique_name = 'PyFuncGrad@Unique'
# using tf.RegisterGradient to set _my_linear_grad function in backward pass for gradient op named rnd_name
tf.RegisterGradient(unique_name)(_my_linear_grad)
g = tf.get_default_graph()
# context manager used to override gradients for nodes created in its block
with g.gradient_override_map({"PyFunc": unique_name}):
# my_linear is used for forward pass - my_linear and y_linear_grad are wrapped inside a single TF node
p = tf.py_func(my_linear, [a, x, b], [tf.float32], stateful=True, name=name)
tf_p = tf.tensordot(x,tf.transpose(tf_a),0) + tf_b
loss = tf.reduce_mean(tf.square(p - y))
tf_loss = tf.reduce_mean(tf.square(tf_p - y))
train_vars = [var for var in tf.trainable_variables()]
optim = tf.train.AdamOptimizer(learning_rate, beta1)
# compute_gradients returns a list so I can just concatenate them to calculate tf_loss, too
grads_and_vars = optim.compute_gradients(loss, var_list=train_vars)
grads_and_vars += optim.compute_gradients(tf_loss, var_list=train_vars)
train_op = optim.apply_gradients(grads_and_vars)
tf.summary.scalar('loss', loss)
with tf.Session() as sess:
train_writer = tf.summary.FileWriter('board', sess.graph)
merge = tf.summary.merge_all()
sess.run(tf.global_variables_initializer())
try:
for epoch in range(EPOCHS):
overall_loss = 0.
# update using each sample separately
for i in range(SAMPLES):
result = sess.run([loss, tf_loss, (a,b), (tf_a, tf_b), merge, train_op], feed_dict={
x: train_input[i],
y: train_label[i]
})
except MyException:
pass
我得到的错误是
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_shape.py", line 670, in merge_with
self.assert_same_rank(other)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_shape.py", line 715, in assert_same_rank
other))
ValueError: Shapes (3,) and (1, 3) must have the same rank
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 727, in _GradientsHelper
in_grad.set_shape(t_in.get_shape())
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 509, in set_shape
self._shape_val = self.shape.merge_with(shape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_shape.py", line 676, in merge_with
raise ValueError("Shapes %s and %s are not compatible" % (self, other))
ValueError: Shapes (3,) and (1, 3) are not compatible
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test6.py", line 69, in <module>
grads_and_vars = optim.compute_gradients(loss, var_list=train_vars)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/optimizer.py", line 511, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 532, in gradients
gate_gradients, aggregation_method, stop_gradients)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 734, in _GradientsHelper
(op.name, i, t_in.shape, in_grad.shape))
ValueError: Incompatible shapes between op input and calculated input gradient. Forward operation: MyLinear. Input index: 0. Original input shape: (1, 3). Calculated input gradient shape: (3,)
我尝试了几种不同的方式来返回梯度(例如,将其变为形状为(1,3)的np.array并通过tf.convert_to_tensor将其转换为张量),而我报告的方法甚至不是最聪明的,但无论如何,我仍然会遇到错误,但问题是我无法完全理解op及其渐变之间的关系。我一直在环顾四周,但我发现大部分与内置操作有关的错误。如果有人可以帮助我,我将不胜感激。
答案 0 :(得分:0)
好吧,我设法使用3D输入变量和sin(x)激活函数来运行代码。看来可行,只是需要正确的tf函数来制作A * x并进行一些重塑。 梯度函数是
def _my_linear_grad(op, grad):
return grad * tf.reshape( tf.cos(tf.tensordot(op.inputs[0], op.inputs[1],1)+op.inputs[2]) * op.inputs[1],(1,3)), grad * tf.convert_to_tensor(np.zeros((3,)).astype(np.float32)), grad * tf.cos(tf.tensordot(op.inputs[0], op.inputs[1],1)+op.inputs[2])
功能
def my_linear(a, x, b):
return np.sin(np.dot(x,a.T).astype(np.float32) + b.astype(np.float32))
损失函数达到1e-6以下的值,并且获得的权重正确。
还是,通过 py_func 包装器计算的损失和直接通过
计算的损失tf_p = tf.sin(tf.tensordot(tf_a,x,1) + tf_b)
tf_loss = tf.reduce_mean(tf.square(tf_p - y))
不一样,我想知道为什么。