我正在尝试通过一维卷积实现注意力模块。
这是我的自定义关注模块。
class convSelfAttention(nn.Module):
def __init__(self, in_dim):
super(convSelfAttention, self).__init__()
self.channel_in = in_dim
self.query_conv = nn.Conv1d(in_channels=in_dim, out_channels=in_dim//2, kernel_size=1)
self.key_conv = nn.Conv1d(in_channels=in_dim, out_channels=in_dim//2, kernel_size=1)
self.value_conv = nn.Conv1d(in_channels=in_dim, out_channels=in_dim, kernel_size=1)
self.gamma = nn.Parameter(torch.zeros(1))
self.activ = nn.Softmax(dim=-1)
def forward(self, x):
p_query = self.query_conv(x).permute(0, 2, 1)
p_key = self.key_conv(x)
energy = torch.bmm(p_query, p_key)
attention = self.activ(energy)
p_value = self.value_conv(x)
out = torch.bmm(p_value, attention.permute(0, 2, 1))
out = self.gamma * out + x
out = out.permute(0, 2, 1)
return out
但是,除了“ self.gamma”以外,渐变均为0,即使删除softmax函数也不会更改。
为了方便起见,我绘制了整个体系结构的梯度流(最后两层是fc,'~~ Encoder'中的所有层都是卷积)。