如何使用pytorch并行化多个GPU上的正则化损失计算?

时间:2019-07-24 01:08:31

标签: pytorch

将模型包装到DataParallel中之后

model20 = FourLayerConvNetWithPool()
model20 = nn.DataParallel(model20)

我可以看到所有可用GPU的利用率。但经过仔细检查后发现,“ cuda:0”的利用率仍然较高(因此,这是一个瓶颈{对于大型网络来说可能很小)}。

这是训练迭代的片段:

model.train()  # put model to training mode
x = x.to(device=self.device, dtype=self.dtype)  # move to device, e.g. GPU
y = y.to(device=self.device, dtype=torch.long)

scores = model(x)
loss = F.cross_entropy(scores, y)

if (reg > 0):
    l2_regularization = torch.tensor(0).to(device=self.device, dtype=self.dtype)
    for param in model.parameters():
        l2_regularization += torch.norm(param, 2)
    loss += reg * l2_regularization

# Zero out all of the gradients for the variables which the optimizer
# will update.
optimizer.zero_grad()

# This is the backwards pass: compute the gradient of the loss with
# respect to each  parameter of the model.
loss.backward()

# Actually update the parameters of the model using the gradients
# computed by the backwards pass.
optimizer.step()

通过评论以上摘录的各个部分,发现F.cross_entropy和torch.norm有助于“ cuda:0”和“ cuda:1”之间的差异。这是有道理的,因为这些部分没有并行化。

我知道我可以将F.cross_entropy移至模型界面。

但是L2正则化损失怎么办?

0 个答案:

没有答案