Question

我正在实现一个遵循经典编码/解码模式的模型，并且在解码器实现上存在几个问题。

目标
这是我打算做的伪代码。

pi = zeros_like((batch,n,n))
log_proba = zero_like((batch,n,n))
for t in range(n):
    embedded_node[mask] = -inf  # embedded_node.shape = mask.shape
    h = some_function(embedded_node)
    h[mask] = inf
    log_proba[t] = log_softmax(h)
    pi[:,t] = argmax(log_proba[t])
    mask[range(batch),pi[t]] = True
loss = log_proba[pi].sum()

我需要能够计算出损耗的反向传播。

我到目前为止所做的事情
为了能够更新pi和mask，它们都必须是变量。但是，在每次运行时，都需要将它们重新初始化为0。我发现这样做的唯一方法是将它们创建为local_variable，并创建我在每次批处理之前运行的初始化操作。

要暂时分配-inf，请使用以下命令：

indices = tf.cast(tf.where(tf.logical_not(mask)), tf.int64)
val_masked = tf.SparseTensor(indices, 
                             tf.broadcast_to(tf.float32.min, [self.conf.batch * t]),
                             (self.conf.batch, self.conf.n_node))
u_c_n = tf.add(u_c_n, tf.sparse.to_dense(val_masked))

还有其他方法吗？

我设法使模型可运行，但是训练不起作用，我怀疑问题出在我的代码那部分。

非常感谢！

Tensorflow：带掩码的迭代解码

0 个答案: