Question

我手边有一个相当复杂的模型。该模型具有线性结构的多个部分：

y = theano.tensor.dot(W,x) + b

我想构建一个优化器，它使用自定义规则计算所有线性结构的渐变，同时保持其他操作不变。 对于我的模型的所有线性部分，覆盖渐变操作的最简单方法是什么？最好不需要编写新的操作。

Answer 1

所以，我花了一些时间为Theano工作PR（~~未合并，截至2017年1月13日~~已经合并），这使得用户能够部分覆盖{{{1}的渐变1}}实例。覆盖是通过符号图完成的，因此您仍然可以获得theano优化的全部好处。

典型用例：

数字安全考虑
重新缩放/剪裁渐变
黎曼自然梯度等特殊梯度程序

制作具有覆盖渐变的操作：

制作所需的计算图
为Op
将一个OfG实例设为您的操作，并设置theano.OpFromGraph参数
调用OfG实例来构建模型

定义OpFromGraph就像编译theano函数一样，但有一点不同：

不支持grad_overrides和updates（截至2017年1月）
你得到一个符号Op而不是数字函数

示例：

givens

注意：''' This creates an atan2_safe Op with smoothed gradient at (0,0) ''' import theano as th import theano.tensor as T # Turn this on if you want theano to build one large graph for your model instead of precompiling the small graph. USE_INLINE = False # In a real case you would set EPS to a much smaller value EPS = 0.01 # define a graph for needed Op s_x, s_y = T.scalars('xy') s_darg = T.scalar(); # backpropagated gradient s_arg = T.arctan2(s_y, s_x) s_abs2 = T.sqr(s_x) + T.sqr(s_y) + EPS s_dx = -s_y / s_abs2 s_dy = s_x / s_abs2 # construct OfG with gradient overrides # NOTE: there are unused inputs in the gradient expression, # however the input count must match, so we pass # on_unused_input='ignore' atan2_safe_grad = th.OpFromGraph([s_x, s_y, s_darg], [s_dx, s_dy], inline=USE_INLINE, on_unused_input='ignore') atan2_safe = th.OpFromGraph([s_x, s_y], [s_arg], inline=USE_INLINE, grad_overrides=atan2_safe_grad) # build graph using the new Op x, y = T.scalar(), T.scalar() arg = atan2_safe(x, y) dx, dy = T.grad(arg, [x, y]) fn = th.function([x, y], [dx, dy]) fn(1., 0.) # gives [-0.0, 0.99099] fn(0., 0.) # gives [0.0, 0.0], no more annoying nan!仍然在很大程度上是实验性的，期待错误。

Theano - 如何覆盖部分操作图的渐变

1 个答案: