Tensorflow自定义梯度用于具有一维以上的自定义操作

时间:2018-09-05 09:07:08

标签: python tensorflow machine-learning

我正在使用tensorflow进行一些NN计算,该计算依赖于子过程调用的外部操作,我知道该子过程的解析导数。为此,我从一个玩具模型开始,在该模型中,我将自定义操作定义为numpy函数及其渐变。我按照https://gist.github.com/harpone/3453185b41d8d985356cbe5e57d67342中报道的示例进行操作,该示例适用于执行标量乘法的简单自定义操作,例如f(a,x)= a * x。我正在尝试为简单的3D-> 1D操作(例如f(A,x)= matmul(x,AT))编写等效代码,但是在反向传播过程中难以匹配输入变量的形状和渐变。 >

代码如下:

    import tensorflow as tf
    from tensorflow.python.framework import ops
    import numpy as np
    import time

    ZERO_TOL = 1e-5
    LOSS_TOL = 1e-4
    SAMPLES = 100
    EPOCHS = 100000

    train_input = np.random.rand(SAMPLES,3)
    W_TRUE = np.ones((1,3))
    train_label = np.dot(train_input,W_TRUE.T) - 2.3 #shape=(SAMPLES,1)

    class MyException(Exception):
        pass


    def _my_linear_grad(op, grad):
        return op.inputs[1], 0., 1.

    def my_linear(a, x, b):
        return np.dot(x,a.T) + b


    learning_rate = 1e-3
    beta1 = 0.9999

    x = tf.placeholder(dtype=tf.float32, shape=(None,3), name='x')
    y = tf.placeholder(dtype=tf.float32, shape=(None,1), name='y')

    a = tf.get_variable('a', dtype=tf.float32, initializer=(np.random.rand(1,3).astype(np.float32)))
    tf_a = tf.get_variable('tf_a', dtype=tf.float32, initializer=(np.random.rand(1,3).astype(np.float32)))
    b = tf.get_variable('b', dtype=tf.float32, initializer=0.)
    tf_b = tf.get_variable('tf_b', dtype=tf.float32, initializer=0.)

    with ops.op_scope([a, x, b], name="MyLinear") as name:
        # custom gradient op name shouldn't conflict with any other TF op name
        unique_name = 'PyFuncGrad@Unique'
        # using tf.RegisterGradient to set _my_linear_grad function in backward pass for gradient op named rnd_name
        tf.RegisterGradient(unique_name)(_my_linear_grad)

        g = tf.get_default_graph()

        # context manager used to override gradients for nodes created in its block
        with g.gradient_override_map({"PyFunc": unique_name}):
            # my_linear is used for forward pass - my_linear and         y_linear_grad are wrapped inside a single TF node
            p = tf.py_func(my_linear, [a, x, b], [tf.float32], stateful=True, name=name)

    tf_p = tf.tensordot(x,tf.transpose(tf_a),0) + tf_b

    loss = tf.reduce_mean(tf.square(p - y))
    tf_loss = tf.reduce_mean(tf.square(tf_p - y))

    train_vars = [var for var in tf.trainable_variables()]
    optim = tf.train.AdamOptimizer(learning_rate, beta1)

    # compute_gradients returns a list so I can just concatenate them to calculate tf_loss, too
    grads_and_vars = optim.compute_gradients(loss, var_list=train_vars)
    grads_and_vars += optim.compute_gradients(tf_loss, var_list=train_vars)
    train_op = optim.apply_gradients(grads_and_vars)

    tf.summary.scalar('loss', loss)

    with tf.Session() as sess:
        train_writer = tf.summary.FileWriter('board', sess.graph)
        merge = tf.summary.merge_all()

        sess.run(tf.global_variables_initializer())

        try:
            for epoch in range(EPOCHS):
                overall_loss = 0.
                # update using each sample separately
                for i in range(SAMPLES):
                    result = sess.run([loss, tf_loss, (a,b), (tf_a, tf_b), merge, train_op], feed_dict={
                x: train_input[i],
                y: train_label[i]
            })

        except MyException:
          pass

我得到的错误是

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_shape.py", line 670, in merge_with
    self.assert_same_rank(other)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_shape.py", line 715, in assert_same_rank
other))
ValueError: Shapes (3,) and (1, 3) must have the same rank

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 727, in _GradientsHelper
    in_grad.set_shape(t_in.get_shape())
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 509, in set_shape
    self._shape_val = self.shape.merge_with(shape)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_shape.py", line 676, in merge_with
raise ValueError("Shapes %s and %s are not compatible" % (self, other))
ValueError: Shapes (3,) and (1, 3) are not compatible

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test6.py", line 69, in <module>
    grads_and_vars = optim.compute_gradients(loss, var_list=train_vars)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/optimizer.py", line 511, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 532, in gradients
gate_gradients, aggregation_method, stop_gradients)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 734, in _GradientsHelper
(op.name, i, t_in.shape, in_grad.shape))
ValueError: Incompatible shapes between op input and calculated input gradient.  Forward operation: MyLinear.  Input index: 0. Original input shape: (1, 3).  Calculated input gradient shape: (3,) 

我尝试了几种不同的方式来返回梯度(例如,将其变为形状为(1,3)的np.array并通过tf.convert_to_tensor将其转换为张量),而我报告的方法甚至不是最聪明的,但无论如何,我仍然会遇到错误,但问题是我无法完全理解op及其渐变之间的关系。我一直在环顾四周,但我发现大部分与内置操作有关的错误。如果有人可以帮助我,我将不胜感激。

1 个答案:

答案 0 :(得分:0)

好吧,我设法使用3D输入变量和sin(x)激活函数来运行代码。看来可行,只是需要正确的tf函数来制作A * x并进行一些重塑。 梯度函数是

def _my_linear_grad(op, grad):
return grad * tf.reshape( tf.cos(tf.tensordot(op.inputs[0], op.inputs[1],1)+op.inputs[2]) * op.inputs[1],(1,3)), grad * tf.convert_to_tensor(np.zeros((3,)).astype(np.float32)),  grad * tf.cos(tf.tensordot(op.inputs[0], op.inputs[1],1)+op.inputs[2])

功能

def my_linear(a, x, b):
    return np.sin(np.dot(x,a.T).astype(np.float32) + b.astype(np.float32))

损失函数达到1e-6以下的值,并且获得的权重正确。

还是,通过 py_func 包装器计算的损失和直接通过

计算的损失
tf_p = tf.sin(tf.tensordot(tf_a,x,1) + tf_b)
tf_loss = tf.reduce_mean(tf.square(tf_p - y))

不一样,我想知道为什么。