Question

我正在使用转换器的编码器部分，将单词令牌（文本）流转换为矢量。然后将此向量输入LSTM。

上图说明了这一概念。我的问题是：如何确保张量流在每个时间步都重复使用相同的模型（层，高度等）。

这是应用变压器的方式：

Xn_ts = []

for t in range(self.T):
    xt = self.Xn[:, t, :] # shape of Xn = (None,timesteps,textlength)
    enc = self.transformer.encode(xt)
    vec = tf.expand_dims(tf.reduce_mean(enc, axis=1), axis=1)
    Xn_ts.append(vec)

XnC = tf.concat(Xn_ts, axis=1)
#XnC is then fed to a LSTM

encode函数如下：

def encode(self, xs, training=True):
    with tf.variable_scope("encoder", reuse=tf.AUTO_REUSE):

        enc = tf.nn.embedding_lookup(self.embeddings, xs)
        enc *= self.hp.d_model**0.5

        enc += positional_encoding(enc, self.hp.maxlen)
        enc = tf.layers.dropout(enc, self.hp.dropout_rate, training=training)

        for i in range(self.hp.num_blocks):
            with tf.variable_scope("num_blocks_{}".format(i), reuse=tf.AUTO_REUSE):
                enc = multihead_attention(queries=enc,
                                          keys=enc,
                                          values=enc,
                                          num_heads=self.hp.num_heads,
                                          dropout_rate=self.hp.dropout_rate,
                                          training=training,
                                          causality=False)

                enc = ff(enc, num_units=[self.hp.d_ff, self.hp.d_model])

    memory = enc

    return memory

从理论上讲，在编码器范围内启用reuse应该可以做到，但是当我增加时间步数时，可训练参数的数量也会增加，所以我显然缺少了一些东西。

在Tensorflow中使用相同的倍数多次使用同一模型

0 个答案: