Question

假设我有n分层神经网络。运行l层后，我想对第l ^ th层输出应用一些转换，而不必在反向传播中包含该转换。

例如：

output_layer_n = self.LinearLayer(output_layer_prev)
#apply some transformation to output_layer_n, but don't want to take autograd w.r.t. this transformation, basically this transformation function doesn't have any parameter 
output_layer_n.data = TransformationFunction(output_layer_n.data)

那么我应该如何实施呢？我想要的是不要在我的代码中考虑TransformationFunction()的梯度。

Answer 1

如果您不想为TransformationFunction计算梯度，则最简单的方法是将requires_grad标志设置为{{1}，以关闭此计算涉及的所有参数的梯度计算}。

Excluding subgraphs from backward:

如果需要梯度的操作只有一个输入，则其   输出也将需要渐变。相反，仅当所有输入   不需要渐变，输出也不需要。向后   在所有张量的子图中永远不会执行计算   不需要渐变。

当您要冻结部分模型时，此功能特别有用，   否则您会事先知道不会使用w.r.t.   一些参数。例如，如果您想微调预训练的CNN，   只需在冻结的基础上切换False标志，然后   在计算达到之前，不会保存任何中间缓冲区   最后一层，仿射变换将使用权重   需要梯度，网络的输出也将需要   他们。

这是一个可以这样做的小例子：

requires_grad

输出：

import torch
import torch.nn as nn

# define layers
normal_layer = nn.Linear(5, 5)
TransformationFunction = nn.Linear(5, 5)
# disable gradient computation for parameters of TransformationFunction
# here weight and bias
TransformationFunction.weight.requires_grad = False
TransformationFunction.bias.requires_grad   = False

# input 
inp = torch.rand(1, 5)

# do computation
out = normal_layer(inp)
out = TransformationFunction(out)

# loss
loss = torch.sum(out)
# backward
loss.backward()

# gradient for l1
print('Gradients for "normal_layer"', normal_layer.weight.grad, normal_layer.bias.grad)
# gradient for l2
print('Gradients for "TransformationFunction"', TransformationFunction.weight.grad, TransformationFunction.bias.grad)

我希望这就是您想要的，如果没有，请更详细地编辑您的问题！

将Tensor应用转换而不将其包含在Backward中

1 个答案: