我试图在不同架构中与另一个编码器/解码器共享架构的编码器/解码器子网络之间的参数。这对我的问题是必要的,因为在测试时它需要大量的计算(和时间)来对原始体系结构进行正向传递,然后提取解码器结果。但是,我注意到虽然我在执行clone()
时明确要求参数共享,但参数不会被共享,每个架构在训练时都有自己的参数。
我通过将一些随机向量前向传播到解码器和两种架构的编码器(通过比较它们的权重),通过一些print()
语句显示两种架构的结果之间的差异。
所以我想知道,在分享参数时,有谁可以帮助我找出我做错了什么?
下面我发布了我的代码的简化版本:
require 'nn'
require 'nngraph'
require 'cutorch'
require 'cunn'
require 'optim'
input = nn.Identity()()
encoder = nn.Sequential():add(nn.Linear(100, 20)):add(nn.ReLU(true)):add(nn.Linear(20, 10))
decoder = nn.Sequential():add(nn.Linear(10, 20)):add(nn.ReLU(true)):add(nn.Linear(20, 100))
code = encoder(input)
reconstruction = decoder(code)
outsideCode = nn.Identity()()
decoderCloned= decoder:clone('weight', 'bias', 'gradWeight', 'gradBias')
outsideReconstruction = decoderCloned(nn.JoinTable(1)({code, outsideCode}))
dumbNet = nn.Sequential():add(nn.Linear(100, 10))
codeRecon = dumbNet(outsideReconstruction)
input2 = nn.Identity()()
encoderTestTime = encoder:clone('weight', 'bias', 'gradWeight', 'gradBias')
decoderTestTime = decoder:clone('weight', 'bias', 'gradWeight', 'gradBias')
codeTest = encoderTestTime(input2)
reconTest = decoderTestTime(codeTest)
gMod = nn.gModule({input, outsideCode}, {reconstruction, codeRecon})
gModTest = nn.gModule({input2}, {reconTest})
criterion1 = nn.BCECriterion()
criterion2 = nn.MSECriterion()
-- Okay, the module has been created. Now it's time to do some other stuff
params, gParams = gMod:getParameters()
numParams = params:nElement()
memReqForParams = numParams * 5 * 4 / 1024 / 1024 -- Convert to MBs
-- If enough memory on GPU, move stuff to the GPU
if memReqForParams <= 1000 then
gMod = gMod:cuda()
gModTest = gModTest:cuda()
criterion1 = criterion1:cuda()
criterion2 = criterion2:cuda()
params, gParams = gMod:getParameters()
end
-- Data
Data = torch.rand(200, 100):cuda()
Data[Data:gt(0.5)] = 1
Data[Data:lt(0.5)] = 0
fakeCodes = torch.rand(400, 10):cuda()
config = {learningRate = 0.001}
state = {}
-- Start training
print ("\nEncoders before training: \n\tgMod's Encoder: " .. gMod:get(2):forward(torch.ones(1, 100):cuda()):sum() .. "\n\tgModTest's Encoder: " .. gModTest:get(2):forward(torch.ones(1, 100):cuda()):sum())
print ("\nDecoders before training: \n\tgMod's Decoder: " .. gMod:get(3):forward(torch.ones(1, 10):cuda()):sum() .. "\n\tgModTest's Decoder: " .. gModTest:get(3):forward(torch.ones(1, 10):cuda()):sum())
gMod:training()
for i=1, Data:size(1) do
local opfunc = function(x)
if x ~= params then
params:copy(x)
end
gMod:zeroGradParameters()
recon, outsideRecon = unpack(gMod:forward({Data[{{i}}], fakeCodes[{{i}}]}))
err = criterion1:forward(recon, Data[{{i}}])
df_dw = criterion1:backward(recon, Data[{{i}}])
errFake = criterion2:forward(outsideRecon, fakeCodes[{{i*2-1, i * 2}}])
df_dwFake = criterion2:backward(outsideRecon, fakeCodes[{{i*2-1, i * 2}}])
errorGrads = {df_dw, df_dwFake}
gMod:backward({Data[{{i}}], fakeCodes[{{i*2-1, i * 2}}]}, errorGrads)
return err, gParams
end
x, reconError = optim.adam(opfunc, params, config, state)
end
print ("\n\nEncoders after training: \n\tgMod's Encoder: " .. gMod:get(2):forward(torch.ones(1, 100):cuda()):sum() .. "\n\tgModTest's Encoder: " .. gModTest:get(2):forward(torch.ones(1, 100):cuda()):sum())
print ("\nDecoders after training: \n\tgMod's Decoder: " .. gMod:get(3):forward(torch.ones(1, 10):cuda()):sum() .. "\n\tgModTest's Decoder: " .. gModTest:get(3):forward(torch.ones(1, 10):cuda()):sum())
答案 0 :(得分:0)
在我为此问题here打开的GitHub问题的帮助下,我得到了 fmassa 的解决方案。可以使用nn.Container
来解决参数共享问题,如下所示:
container = nn.Container()
container:add(gMod)
container:add(gModTest)
params, gradParams = container:getParameters()