如何在Tensorflow 2.0中使用gradient_override_map?

时间:2019-04-19 16:12:29

标签: python tensorflow tensorflow2.0

我正试图在Tensorflow 2.0中使用gradient_override_map。有一个example in the documentation,我也将在此处作为示例。

在2.0中,GradientTape可用于如下计算梯度:

import tensorflow as tf
print(tf.version.VERSION)  # 2.0.0-alpha0

x = tf.Variable(5.0)
with tf.GradientTape() as tape:
    s_1 = tf.square(x)
print(tape.gradient(s_1, x))

还有tf.custom_gradient装饰器,可用于定义 new 函数的渐变(再次使用example from the docs):

import tensorflow as tf
print(tf.version.VERSION)  # 2.0.0-alpha

@tf.custom_gradient
def log1pexp(x):
    e = tf.exp(x)

    def grad(dy):
        return dy * (1 - 1 / (1 + e))

    return tf.math.log(1 + e), grad

x = tf.Variable(100.)

with tf.GradientTape() as tape:
    y = log1pexp(x)

print(tape.gradient(y, x))

但是,我想将梯度替换为诸如tf.square之类的标准函数。我尝试使用以下代码:

@tf.RegisterGradient("CustomSquare")
def _custom_square_grad(op, grad):
  return tf.constant(0)

with tf.Graph().as_default() as g:
    x = tf.Variable(5.0)
    with g.gradient_override_map({"Square": "CustomSquare"}):
        with tf.GradientTape() as tape:
            s_2 = tf.square(x, name="Square")

    with tf.compat.v1.Session() as sess:
        sess.run(tf.compat.v1.global_variables_initializer())            
        print(sess.run(tape.gradient(s_2, x)))

但是,有两个问题:渐变替换似乎不起作用(它的评估结果为10.0而不是0.0),我需要诉诸session.run()来执行图形。是否可以在“本机” TensorFlow 2.0中实现这一目标?

在TensorFlow 1.12.0中,以下代码会产生所需的输出:

import tensorflow as tf
print(tf.__version__)  # 1.12.0

@tf.RegisterGradient("CustomSquare")
def _custom_square_grad(op, grad):
  return tf.constant(0)

x = tf.Variable(5.0)

g = tf.get_default_graph()
with g.gradient_override_map({"Square": "CustomSquare"}):
    s_2 = tf.square(x, name="Square")
grad = tf.gradients(s_2, x)

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  print(sess.run(grad))

2 个答案:

答案 0 :(得分:3)

TensorFlow 2.0中没有内置机制可以覆盖作用域内的内置运算符的所有渐变。但是,如果您能够为每次呼叫内置操作员修改呼叫站点,则可以使用tf.custom_gradient装饰器,如下所示:

@tf.custom_gradient
def custom_square(x):
  def grad(dy):
    return tf.constant(0.0)
  return tf.square(x), grad

with tf.Graph().as_default() as g:
  x = tf.Variable(5.0)
  with tf.GradientTape() as tape:
    s_2 = custom_square(x)

  with tf.compat.v1.Session() as sess:
    sess.run(tf.compat.v1.global_variables_initializer())            
    print(sess.run(tape.gradient(s_2, x)))

答案 1 :(得分:2)

除了mrry的答案外,我还要补充两点:

(1)在TF 2中,我们可以使用tf.GradientTape而不构建图形,就像这样:

@tf.custom_gradient
def custom_square(x):
  def grad(dy):
    return tf.constant(0.0)
  return tf.square(x), grad

with tf.GradientTape() as tape:
  x = tf.Variable(5.0)
  s_2 = custom_square(x)

print(tape.gradient(s_2,x).numpy())

(2)将您的custom grad与上一个毕业生相乘

请注意,梯度计算是一个链式计算,我们应该将自定义grad乘以dy(先前计算的梯度)。 否则,我们的自定义功能将在链式计算中被破坏。这是一个示例:

@tf.custom_gradient
def custom_square(x):
  def grad(dy):
    return tf.constant(4.0)
  return tf.square(x), grad

with tf.GradientTape(persistent=True) as tape:
  x = tf.Variable(5.0)
  s_2 = custom_square(x)
  s_4 = custom_square(s_2)

print("Grad from s_4 to x: ",tape.gradient(s_4,x).numpy())
print("Grad from s_4 to s_2: ",tape.gradient(s_4,s_2).numpy())
print("Grad from s_2 to x: ",tape.gradient(s_2,x).numpy())

结果:

Grad from s_4 to x:  4.0
Grad from s_4 to s_2:  4.0
Grad from s_2 to x:  4.0

s_4x的等级应该是16(从s_4s_2的累积等级,以及从s_2x的等级)。

但是结果是4。这意味着它没有积累上一步的梯度。

将自定义毕业文凭与dy相乘将解决问题:

@tf.custom_gradient
def custom_square(x):
  def grad(dy):
    return tf.constant(4.0)*dy
  return tf.square(x), grad

with tf.GradientTape(persistent=True) as tape:
  x = tf.Variable(5.0)
  s_2 = custom_square(x)
  s_4 = custom_square(s_2)

print("Grad from s_4 to x: ",tape.gradient(s_4,x).numpy())
print("Grad from s_4 to s_2: ",tape.gradient(s_4,s_2).numpy())
print("Grad from s_2 to x: ",tape.gradient(s_2,x).numpy())

这是结果:

Grad from s_4 to x:  16.0
Grad from s_4 to s_2:  4.0
Grad from s_2 to x:  4.0

您可以在此处尝试通过Colab实施:https://colab.research.google.com/drive/1gbLopOLJiyznDA-Cr473bZEeWkWh_KGG?usp=sharing