Question

我正在尝试通过一维卷积实现注意力模块。
这是我的自定义关注模块。

class convSelfAttention(nn.Module):
    def __init__(self, in_dim):
        super(convSelfAttention, self).__init__()
        self.channel_in = in_dim
        self.query_conv = nn.Conv1d(in_channels=in_dim, out_channels=in_dim//2, kernel_size=1)
        self.key_conv = nn.Conv1d(in_channels=in_dim, out_channels=in_dim//2, kernel_size=1)
        self.value_conv = nn.Conv1d(in_channels=in_dim, out_channels=in_dim, kernel_size=1)
        self.gamma = nn.Parameter(torch.zeros(1))
        self.activ = nn.Softmax(dim=-1)

    def forward(self, x):
        p_query = self.query_conv(x).permute(0, 2, 1)
        p_key = self.key_conv(x)
        energy = torch.bmm(p_query, p_key)
        attention = self.activ(energy)
        p_value = self.value_conv(x)

        out = torch.bmm(p_value, attention.permute(0, 2, 1))
        out = self.gamma * out + x
        out = out.permute(0, 2, 1)
        return out

但是，除了“ self.gamma”以外，渐变均为0，即使删除softmax函数也不会更改。
为了方便起见，我绘制了整个体系结构的梯度流（最后两层是fc，'~~ Encoder'中的所有层都是卷积）。

一维卷积的渐变变为0

0 个答案: