Question

我在seq2seq.sequence_loss中收到此错误，即使logits和label的第一个dim具有相同的维度，即batchSize

我在TF 1.0版本中创建了一个seq2seq模型。我的损失函数如下：

    logits  = self.decoder_logits_train
    targets = self.decoder_train_targets
    self.loss     = seq2seq.sequence_loss(logits=logits, targets=targets, weights=self.loss_weights)
    self.train_op = tf.train.AdamOptimizer().minimize(self.loss)

我在训练时运行网络时出现以下错误：

InvalidArgumentError (see above for traceback): logits and labels must have the same first dimension, got logits shape [1280,150000] and labels shape [1536]
     [[Node: sequence_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](sequence_loss/Reshape, sequence_loss/Reshape_1)]]

我确认logits和targets张量的形状如下：

a,b = sess.run([model.decoder_logits_train, model.decoder_train_targets], feed_dict)
print(np.shape(a)) # (128, 10, 150000) which is (BatchSize, MaxSeqSize, Vocabsize)
print(np.shape(b)) # (128, 12) which is (BatchSize, Max length of seq including padding)

因此，由于targets和logits的第一维相同，我收到此错误的原因是什么？

有趣的是，在错误中你可以观察到logits的维度被称为(1280, 150000)，(128 * 10, 150000) [product of first two dimension, vocab_size]，而目标也是(1536)，这是(128*12)，又是前两个维度的产物？

注意：Tensorflow 1.0 CPU版本

Answer 1

错误消息似乎有点误导，因为您实际上需要第一和第二维度相同。这写成here：

logits：形状张量[batch_size，sequence_length，   num_decoder_symbols]和dtype float。 logits对应于   在每个时间步长预测所有班级。

目标：形状张量[batch_size，sequence_length]和dtype   INT。目标代表每个时间步的真实等级。

这也是有道理的，因为logits是概率向量，而targets代表实际输出，所以它们需要具有相同的长度。

Answer 2

也许是您填充错误的方式。如果将_EOS填充到目标序列的末尾，则max_length（目标句子的实际长度）应加1为[batch，max_len + 1]。由于您填充了_GO和_EOS，因此目标句子长度应加2，等于12。

我读过其他人对NMT的实现，他们只填充_EOS表示目标语句，而_GO表示解码器的输入。告诉我我是否错。

Answer 3

我和您有同样的错误，并且我理解了这个问题：

问题：

您使用以下参数运行解码器：

targets是解码器输入。由于填充，它们的长度为max_length。形状：[batch_size, max_length]
sequence_length是当前批次中所有目标的非填充长度。形状：[batch_size]

您的日志，即输出tf.contrib.seq2seq.dynamic_decode的形状为：

[batch_size, longer_sequence_in_this_batch, n_classes]

longer_sequence_in_this_batch等于tf.reduce_max(sequence_length)

因此，在计算损失时会遇到问题，因为您尝试同时使用两者：

您的第一维形状为longer_sequence_in_this_batch的logit
您的目标具有第一维形状max_length

请注意，longer_sequence_in_this_batch <= max_length

如何解决：

您可以简单地将一些填充应用于登录。

logits  = self.decoder_logits_train
targets = self.decoder_train_targets

paddings = [[0, 0], [0, max_length-tf.shape(logits)[1]], [0, 0]]
padded_logits = tf.pad(logits, paddings, 'CONSTANT', constant_values=0)


self.loss = seq2seq.sequence_loss(logits=padded_logits, targets=targets, 
                                  weights=self.loss_weights)

使用此方法，可以确保将登录日志填充为目标，并且其维度为[batch_size, max_length, n_classes]

有关键盘功能的更多信息，请访问 Tensorflow's documentation

InvalidArgumentError：logits和labels必须具有相同的第一维seq2seq Tensorflow

3 个答案: