Question

在forward方法中，我只做一组torch.add(torch.bmm(x, exp_w), self.b)，那么我的模型就正确地向后传播了。当我添加另一层-torch.add(torch.bmm(out, exp_w2), self.b2)时，渐变不会更新，模型也不会学习。如果我将激活功能从nn.Sigmoid更改为nn.ReLU，则它可以在两层中使用。

现在开始思考这一天，而不是弄清楚为什么它不能与nn.Sigmoid一起使用。

我尝试了不同的学习率，损失函数和优化函数，但似乎没有任何结合。当我在训练前后将权重加在一起时，它们是相同的。

代码：

class MyModel(nn.Module):

    def __init__(self, input_dim, output_dim):
        torch.manual_seed(1)
        super(MyModel, self).__init__()
        self.input_dim = input_dim
        self.output_dim = output_dim
        hidden_1_dimentsions = 20
        self.w = torch.nn.Parameter(torch.empty(input_dim, hidden_1_dimentsions).uniform_(0, 1))
        self.b = torch.nn.Parameter(torch.empty(hidden_1_dimentsions).uniform_(0, 1))

        self.w2 = torch.nn.Parameter(torch.empty(hidden_1_dimentsions, output_dim).uniform_(0, 1))
        self.b2 = torch.nn.Parameter(torch.empty(output_dim).uniform_(0, 1))

    def activation(self):
        return torch.nn.Sigmoid()

    def forward(self, x):
        x = x.view((x.shape[0], 1, self.input_dim))

        exp_w = self.w.expand(x.shape[0], self.w.size(0), self.w.size(1))
        out = torch.add(torch.bmm(x, exp_w), self.b)
        exp_w2 = self.w2.expand(out.shape[0], self.w2.size(0), self.w2.size(1))
        out = torch.add(torch.bmm(out, exp_w2), self.b2)
        out = self.activation()(out)
        return out.view(x.shape[0])

Answer 1

除了损失函数，激活函数和学习率，参数初始化也很重要。我建议您看一下Xavier的初始化：https://pytorch.org/docs/stable/nn.html#torch.nn.init.xavier_uniform_

此外，对于广泛的问题和网络体系结构，批处理规范化可确保您的激活具有零均值和标准差，有助于：https://pytorch.org/docs/stable/nn.html#torch.nn.BatchNorm1d

如果您有兴趣了解更多有关此原因的信息，则主要是由于梯度问题消失了，这意味着您的梯度变得如此之小，以至于权重无法更新。它是如此普遍，以至于它在Wikipedia上都有自己的页面：https://en.wikipedia.org/wiki/Vanishing_gradient_problem

多矩阵乘法失去权重更新

1 个答案: