Question

在我看到的所有神经网络分类示例中，它们都有训练数据，其中一个类别是主要类别或每个输入数据的标签。

您可以提供包含多个标签的培训数据吗？例如：带有＆＃34; cat＆＃34;和＃34;鼠标＆＃34;。

我理解（也许是错误的）如果你在输出层使用softmax进行概率/预测，它往往会尝试选择一个（最大化辨别能力）。我猜这会伤害/阻止学习并使用输入数据预测多个标签。

是否存在NN的任何方法/架构，其中训练数据中有多个标签并且有多个输出预测？或者就是这种情况，我错过了一些重要的理解。请澄清。

Answer 1

大多数示例每个输入都有一个类，所以不要错过任何东西。然而，可以进行多类分类，有时在文献中称为联合分类。

你建议使用softmax的天真实现会很困难，因为最后一层的输出必须加起来为1，所以你拥有的类越多，就越难弄清楚网络想要说些什么。

您可以更改架构以达到您想要的效果。对于每个类，你可以有一个二进制softmax分类器，它从倒数第二层~~分支出来，或者你可以使用一个sigmoid，它不需要加一个（即使每个神经元输出在0和1之间）。注意使用sigmoid可能会使训练更加困难。~~

或者，您可以为每个班级训练多个网络，然后在最后将它们组合成一个分类系统。这取决于你想象的任务有多复杂。

Answer 2

是否存在NN的任何方法/架构，其中训练数据中有多个标签并且有多个输出预测？

答案是肯定的。为了简要回答你的问题，我在一个高级神经网络库Keras的背景下给出了一个例子。

让我们考虑以下模型。我们想预测有多少转推和喜欢新闻标题将在Twitter上收到。模型的主要输入将是标题本身，作为一系列单词，但为了增加趣味性，我们的模型还将具有辅助输入，接收额外数据，例如标题发布时的时间等。

from keras.layers import Input, Embedding, LSTM, Dense, merge
from keras.models import Model

# headline input: meant to receive sequences of 100 integers, between 1 and 10000.
# note that we can name any layer by passing it a "name" argument.
main_input = Input(shape=(100,), dtype='int32', name='main_input')

# this embedding layer will encode the input sequence
# into a sequence of dense 512-dimensional vectors.
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)

# a LSTM will transform the vector sequence into a single vector,
# containing information about the entire sequence
lstm_out = LSTM(32)(x)

auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)

auxiliary_input = Input(shape=(5,), name='aux_input')
x = merge([lstm_out, auxiliary_input], mode='concat')

# we stack a deep fully-connected network on top
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)

# and finally we add the main logistic regression layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)

这定义了一个带有两个输入和两个输出的模型：

model = Model(input=[main_input, auxiliary_input], output=[main_output, auxiliary_output])

现在，让我们按如下方式编译和训练模型：

model.compile(optimizer='rmsprop',
              loss={'main_output': 'binary_crossentropy', 'aux_output': 'binary_crossentropy'},
              loss_weights={'main_output': 1., 'aux_output': 0.2})

# and trained it via:
model.fit({'main_input': headline_data, 'aux_input': additional_data},
          {'main_output': labels, 'aux_output': labels},
          nb_epoch=50, batch_size=32)

参考：Multi-input and multi-output models in Keras

神经网络分类：每个训练数据是否总是必须有一个标签

2 个答案: