Question

上下文：我有一组文档，每个文档都有两个相关的概率值：属于A类的概率或属于B类的概率。这些类是互斥的，概率加起来为1。因此，例如，文件D具有与基础事实相关联的概率（0.6,0.4）。

每个文档由它包含的术语的tfidf表示，从0到1标准化。我还尝试了doc2vec（标准化形式-1到1）和其他几种方法。

我建立了一个非常简单的神经网络来预测这种概率分布。

输入具有与要素一样多的节点的层
带有一个节点的单个隐藏层
带softmax和两个节点的输出层
交叉熵损失函数
我也尝试过不同的更新功能和学习率

这是我用nolearn编写的代码：

net = nolearn.lasagne.NeuralNet(
    layers=[('input', layers.InputLayer),
        ('hidden1', layers.DenseLayer),
        ('output', layers.DenseLayer)],
    input_shape=(None, X_train.shape[1]),
    hidden1_num_units=1,
    output_num_units=2,
    output_nonlinearity=lasagne.nonlinearities.softmax,
    objective_loss_function=lasagne.objectives.binary_crossentropy,
    max_epochs=50,
    on_epoch_finished=[es.EarlyStopping(patience=5, gamma=0.0001)],
    regression=True,
    update=lasagne.updates.adam,
    update_learning_rate=0.001,
    verbose=2)
net.fit(X_train, y_train)
y_true, y_pred = y_test, net.predict(X_test)

我的问题是：我的预测有一个截止点，没有预测低于该点（查看图片以了解我的意思）。 This plot shows the difference between the true probability and my predictions。点越接近红线，预测越好。理想情况下，所有点都在线上。我怎样才能解决这个问题，为什么会这样呢？

编辑：实际上我只是删除了隐藏层来解决问题：

net = nolearn.lasagne.NeuralNet(
    layers=[('input', layers.InputLayer),
        ('output', layers.DenseLayer)],
    input_shape=(None, X_train.shape[1]),
    output_num_units=2,
    output_nonlinearity=lasagne.nonlinearities.softmax,
    objective_loss_function=lasagne.objectives.binary_crossentropy,
    max_epochs=50,
    on_epoch_finished=[es.EarlyStopping(patience=5, gamma=0.0001)],
    regression=True,
    update=lasagne.updates.adam,
    update_learning_rate=0.001,
    verbose=2)
net.fit(X_train, y_train)
y_true, y_pred = y_test, net.predict(X_test)

但是我仍然无法理解为什么我遇到这个问题以及为什么删除隐藏层解决了它。有什么想法吗？

这里的新情节：

Answer 1

我认为你的训练集输出值应为[0,1]或[1,0]，
[0.6,0.4]不适用于softmax / Crossentropy。

神经网络回归预测的截止值

1 个答案: