Question

问题：一个非常长的RNN网

N1 -- N2 -- ... --- N100

对于AdamOptimizer之类的优化工具，compute_gradient()会为所有训练变量提供渐变。

但是，它可能会在某个步骤中爆炸。

但是如何剪辑那些中级的？

一种方法可能是从“N100 - ＆gt; N99”手动执行backprop，剪切渐变，然后是“N99 - ＆gt; N98”等等，但这太复杂了。

所以我的问题是：是否有更简单的方法来剪辑中间渐变？（当然，严格来说，它们不再是数学意义上的渐变）

Answer 1

您可以使用custom_gradient装饰器制作tf.identity版本，剪辑中间爆炸渐变。

``` 来自tensorflow.contrib.eager.python import tfe

@ tfe.custom_gradient def gradient_clipping_identity（tensor，max_norm）： result = tf.identity（tensor）

def grad（dresult）： return tf.clip_by_norm（dresult，max_norm），None

返回结果，毕业 ```

然后使用gradient_clipping_identity，因为您通常会使用身份，并且您的渐变会在向后传递中被剪切。

Answer 2

@tf.custom_gradient
def gradient_clipping(x):
  return x, lambda dy: tf.clip_by_norm(dy, 10.0)