Question

我试图创建一个语言模型。我有logit和尺寸为[32, 312, 512]

的目标

其中：

.shape[0]是batch_size
.shape[1]是sequence_max_len
.shape[2]是vocabulary size

问题是 - 当我将logit和target传递给损失函数时，如下所示：

self.loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(
                                          logits=self.logit, labels=self.y))

它是否计算当前批次的适当损失？或者我应该重塑logit和target以表达以下形状：[32, 312*512]？

提前感谢您的帮助！

Answer 1

api文档说明了标签，

标签：每行标签[i]必须是有效的概率分布

如果你一次预测每个角色，你的词汇大小512就会有一个概率分布（每个角色的概率总和为1）。鉴于此，你的标签和未缩放的形状对数[32,312] ，512]，你应该在调用函数之前将其重塑为[32 * 312,512]。这样，标签的每一行都有一个有效的概率分布，你的未缩放的logits将被函数本身转换为prob分布，然后计算损失。

Answer 2

答案是：它无关紧要，因为#include <stdio.h> int main(){ float rate; rate = 0.06; int duration; float principal_amount; float total_interest; printf("Enter duration in years:"); scanf("%d", duration); printf("Enter principal amount of loan:"); scanf("%f", principal_amount); total_interest = principal_amount * duration * rate; printf("Output: %f", total_interest) return 0; }有tf.nn.softmax_cross_entropy_with_logits()参数：

dim

同样在dim: The class dimension. Defaulted to -1 which is the last dimension. name: A name for the operation (optional).内你有这段代码：

tf.nn.softmax_cross_entropy_with_logits()

TensorFlow - 预测下一个单词 - 丢失函数logit na target shape

2 个答案: