Question

我有12个单词的序列，我使用12x256矩阵表示（使用单词嵌入）。让我们将它们称为 $e_1,\dots,e_{12}$ 。我希望以此为输入并输出1x256向量。但是我不想使用（12x256）x 256的密集层。相反，我想使用12个嵌入的加权总和来创建输出嵌入

$w_1 e_1 + \dots + w_{12} e_{12}$

其中wi是标量（因此存在权重分配）。

如何在pytorch中创建可训练的wi？我是新手，只熟悉nn.Linear这样的标准模块。

Answer 1

您可以通过kernel_size = 1的一维卷积实现

main> matrixGroupings [[0,2,1],[2,2,0],[[0,0,2]]
[["_ab"],["cd_"],["__e"]]

这种卷积将具有12个参数。在您提供的公式中，每个参数都将等于e_i。

换句话说，该卷积将在尺寸为256的维数上运行，并将其与可学习的权重相加。

Answer 2

这应该可以解决加权平均的问题：

from torch import nn
import torch


class LinearWeightedAvg(nn.Module):
    def __init__(self, n_inputs):
        super(LinearWeightedAvg, self).__init__()
        self.weights = nn.ParameterList([nn.Parameter(torch.randn(1)) for i in range(n_inputs)])

    def forward(self, input):
        res = 0
        for emb_idx, emb in enumerate(input):
            res += emb * self.weights[emb_idx]
        return res


example_data = torch.rand(12, 256)
wa_layer = LinearWeightedAvg(12)
res = wa_layer(example_data)
print(res.shape)

我在pytorch论坛上收到的先前答案启发了我们的答案：
https://discuss.pytorch.org/t/dense-layer-with-different-inputs-for-each-neuron/47348

pytorch中嵌入的加权求和

2 个答案: