我编写了一个模型,可以使用一种热编码对文本进行分类。以下是我的代码(我没有包括导入内容以使代码简洁明了):
# Generate one-hot encodings for data and labels
num_labels = len(set(train_label_text))
# vocab_size = len(list(itertools.chain.from_iterable(train_data_text)))
vocab_size = 10000
label_tokenizer = keras.preprocessing.text.Tokenizer(num_words=num_labels)
label_tokenizer.fit_on_texts(train_label_text)
train_label = label_tokenizer.texts_to_matrix(train_label_text)
data_tokenizer = keras.preprocessing.text.Tokenizer(num_words=vocab_size)
data_tokenizer.fit_on_texts(train_data_text_unsplit)
train_data = data_tokenizer.texts_to_matrix(train_data_text)
# Model
model = keras.Sequential()
model.add(keras.layers.Dense(8, input_shape=(vocab_size,), activation=tf.nn.relu))
model.add(keras.layers.Dense(num_labels, activation=tf.nn.sigmoid))
model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
x_val = train_data[:leave_alone]
partial_x_train = train_data[leave_alone:]
y_val = train_label[:leave_alone]
partial_y_train = train_label[leave_alone:]
history = model.fit(partial_x_train,
partial_y_train,
epochs=epochs,
batch_size=512,
validation_data=(x_val, y_val),
verbose=1)
我的模型摘要如下:
dense_40(密集)(无,8个)80008
总参数:80,134 可训练的参数:80,134 不可训练的参数:0
我的数据和目标如下:
输入的前两个记录: array([[0。,0.,0.,...,0.,0.,0.], [0.,0.,0.,...,0.,0.,0。]])
输出的前两个记录: array([[[0。,0.,0.,1.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.], [0.,0.,0.,1.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0。]])
输出中的值数量与模型摘要中的14个匹配。但是,当我运行模型时,出现以下错误:
ValueError:检查目标时出错:预期density_43的形状为(1,),但数组的形状为(14,)
我在网上检查了示例以及此处其他问题的一些答案,但没有发现我做错了什么。有什么明显的我想念的东西吗?谢谢!