我有一个如下架构(使用nngraph构建):
require 'nn'
require 'nngraph'
input = nn.Identity()()
net1 = nn.Sequential():add(nn.SpatialConvolution(1, 5, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(5, 20, 4, 4))
net2 = nn.Sequential():add(nn.SpatialFullConvolution(20, 5, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialFullConvolution(5, 1, 3, 3)):add(nn.Sigmoid())
net3 = nn.Sequential():add(nn.SpatialConvolution(1, 20, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(20, 40, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialConvolution(40, 2, 3, 3)):add(nn.Sigmoid())
output1 = net1(input)
output2 = net2(output1)
output3 = net3(output2)
gMod = nn.gModule({input}, {output1, output3})
target1 = torch.rand(20, 51, 51)
target2 = torch.rand(2, 49, 49)
target2[target2:gt(0.5)] = 1
target2[target2:lt(0.5)] = 0
-- Do a forward pass
out1, out2 = unpack(gMod:forward(torch.rand(1, 56, 56)))
cr1 = nn.MSECriterion()
cr1:forward(out1, target1)
gradient1 = cr1:backward(out1, target1)
cr2 = nn.BCECriterion()
cr2:forward(out2, target2)
gradient2 = cr2:backward(out2, target2)
-- Now update the weights for the networks
LR = 0.001
gMod:backward(input, {gradient1, gradient2})
gMod:updateParameters(LR)
我想知道:
1)如何停止 gradient2更新 net1 的权重,并且只会更新 net2 和的权重NET3
2)如何防止 gradient2 更新 net3 权重,但更新其他子[网络]权重?
答案 0 :(得分:1)
我找到了问题的解决方案。下面我发布每个相关的代码:
问题1 :
这有点棘手但完全可行。如果net2
的第一层权重不应该用 gradient2 更新,则需要在此之后修改图层的updateGradInput()函数并使其输出零张量。这在以下代码中完成:
input = nn.Identity()()
net1 = nn.Sequential():add(nn.SpatialConvolution(1, 5, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(5, 20, 4, 4))
net2 = nn.Sequential():add(nn.SpatialFullConvolutionInputGrad0(20, 5, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialFullConvolution(5, 1, 3, 3)):add(nn.Sigmoid())
net3 = nn.Sequential():add(nn.SpatialConvolution(1, 20, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(20, 40, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialConvolution(40, 2, 3, 3)):add(nn.Sigmoid())
-- Modifying the updateGradInput function so that it will output a zeroed-out tensor at the first layer of net2
local tempLayer = net2:get(1)
function tempLayer:updateGradInput(input, gradOutput)
self.gradInput:resizeAs(input):zero()
return self.gradInput
end
output1 = net1(input)
output2 = net2(output1)
output3 = net3(output2)
gMod = nn.gModule({input}, {output1, output3})
-- Everything else is the same ...
问题2 :
input = nn.Identity()()
net1 = nn.Sequential():add(nn.SpatialConvolution(1, 5, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(5, 20, 4, 4))
net2 = nn.Sequential():add(nn.SpatialFullConvolution(20, 5, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialFullConvolution(5, 1, 3, 3)):add(nn.Sigmoid())
net3 = nn.Sequential():add(nn.SpatialConvolution(1, 20, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(20, 40, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialConvolution(40, 2, 3, 3)):add(nn.Sigmoid())
net3.updateParameters = function() end -- Doing this prevents net3 weights get updated during the backward pass since the updateParameters function has been over-ridden
output1 = net1(input)
output2 = net2(output1)
output3 = net3(output2)
gMod = nn.gModule({input}, {output1, output3})
-- Everything else is the same ...
答案 1 :(得分:-1)
您是否尝试在net1上停止传播?
net1.updateGradInput = function(self, inp, out) end
net1.accGradParameters = function(self,inp, out) end
只需在gradient1 = cr1:backward(out1, target1)
之后放置此代码即可。