我有一个简单的nn
模型,看起来像这样
class TestRNN(nn.Module):
def __init__(self, batch_size, n_steps, n_inputs, n_neurons, n_outputs):
super(TestRNN, self).__init__()
...
self.basic_rnn = nn.RNN(self.n_inputs, self.n_neurons)
self.FC = nn.Linear(self.n_neurons, self.n_outputs)
def forward(self, X):
...
lstm_out, self.hidden = self.basic_rnn(X, self.hidden)
out = self.FC(self.hidden)
return out.view(-1, self.n_outputs)
,并且我正在使用criterion = nn.CrossEntropyLoss()
来计算错误。操作顺序如下:
# get the inputs
x, y = data
# forward + backward + optimize
outputs = model(x)
loss = criterion(outputs, y)
我的训练数据x
归一化,如下所示:
tensor([[[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[2.6164e-02, 2.6164e-02, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 1.3108e-05],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[9.5062e-01, 3.1036e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[0.0000e+00, 1.3717e-05, 3.2659e-07, ..., 0.0000e+00,
0.0000e+00, 3.2659e-07]],
[[5.1934e-01, 5.4041e-01, 6.8083e-06, ..., 0.0000e+00,
0.0000e+00, 6.8083e-06],
[5.2340e-01, 6.0007e-01, 2.7062e-06, ..., 0.0000e+00,
0.0000e+00, 2.7062e-06],
[8.1923e-01, 5.7346e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
[[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0714e-01, 7.0708e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 7.0407e-06],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
...,
[[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.1852e-01, 2.3411e-02, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0775e-01, 7.0646e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 3.9888e-06],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
[[5.9611e-01, 5.8796e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0710e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.7538e-01, 2.4842e-01, 1.7787e-06, ..., 0.0000e+00,
0.0000e+00, 1.7787e-06],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
[[5.2433e-01, 5.2433e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[1.3155e-01, 1.3155e-01, 0.0000e+00, ..., 8.6691e-02,
9.7871e-01, 0.0000e+00],
[7.4412e-01, 6.6311e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[7.0711e-01, 7.0711e-01, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 9.6093e-07]]])
传递给条件函数的典型output
和y
如下所示:
tensor([[-0.0513],
[-0.0445],
[-0.0514],
[-0.0579],
[-0.0539],
[-0.0323],
[-0.0521],
[-0.0294],
[-0.0372],
[-0.0518],
[-0.0516],
[-0.0501],
[-0.0312],
[-0.0496],
[-0.0436],
[-0.0514],
[-0.0518],
[-0.0465],
[-0.0530],
[-0.0471],
[-0.0344],
[-0.0502],
[-0.0536],
[-0.0594],
[-0.0356],
[-0.0371],
[-0.0513],
[-0.0528],
[-0.0621],
[-0.0404],
[-0.0403],
[-0.0562],
[-0.0510],
[-0.0580],
[-0.0516],
[-0.0556],
[-0.0063],
[-0.0459],
[-0.0494],
[-0.0460],
[-0.0631],
[-0.0525],
[-0.0454],
[-0.0509],
[-0.0522],
[-0.0426],
[-0.0527],
[-0.0423],
[-0.0572],
[-0.0308],
[-0.0452],
[-0.0555],
[-0.0479],
[-0.0513],
[-0.0514],
[-0.0498],
[-0.0514],
[-0.0471],
[-0.0505],
[-0.0467],
[-0.0485],
[-0.0520],
[-0.0517],
[-0.0442]], device='cuda:0', grad_fn=<ViewBackward>)
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], device='cuda:0')
应用标准时,出现以下错误(以CUDA_LAUNCH_BLOCKING = 1运行):
/opt/conda/conda-bld/pytorch_1549628766161/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [7,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1549628766161/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [20,0,0] Assertion `t >= 0 && t < n_classes` failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1549628766161/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu line=111 error=59 : device-side assert triggered
我的模型输出负值的事实导致上述错误消息,我该如何解决此问题?
答案 0 :(得分:2)
TL; DR
您有两个选择:
outputs
的第二维的大小为2,而不是1。nn.BCEWithLogitsLoss
代替nn.CrossEntropyLoss
我认为问题不是负数。它是outputs
的形状。
看看您的数组y
,我发现您有2个不同的类(也许更多,但让我们假设它是2)。这意味着outputs
的最后一个维度应为2。原因是outputs
需要为2个不同类的每一个都赋予一个“分数”(请参见the documentation)。分数可以为负,零或正。但是您的outputs
的形状是[64,1]
,而不是所需的[64,2]
。
nn.CrossEntropyLoss()
对象的步骤之一是将这些分数转换为两个类的概率分布。这是通过softmax操作完成的。但是,在进行二进制分类时(也就是在我们当前的情况下,仅分类2个分类),还有另一种选择:仅给出一个分类的分数,使用S形函数将其转换为该分类的概率,并且然后对此执行“ 1-p”以获得另一类的概率。此选项意味着outputs
仅需要为两个班级之一打分,就像您当前的情况一样。要选择此选项,您需要将nn.CrossEntropyLoss
更改为nn.BCEWithLogitsLoss
。然后,您可以按照当前的操作将其传递给outputs
和y
(但是,请注意outputs
的形状必须精确地为y
的形状,因此在您的例如,您将需要传递outputs[:,0]
而不是outputs
,还需要将y
转换为浮点数:y.float()
。因此调用是criterion(outputs[:,0], y.float())
)