我正在使用下面的代码使用Encoder进行标签编码。它可以编码,但是以1而不是0开头。如何使它从0开始编码?
label_tokenizer = Tokenizer()
label_tokenizer.fit_on_texts(labels)
training_label_seq = np.array(label_tokenizer.texts_to_sequences(train_labels))
validation_label_seq = np.array(label_tokenizer.texts_to_sequences(validation_labels))
下面的代码表明它从1开始:
label_tokenizer.word_index
{'credit': 10,
'deduction': 9,
'notification': 6,
'notificationcredit': 4,
'notificationfailed': 8,
'notificationfinancial': 1,
'notificationimportant': 2,
'notificationreminder': 7,
'notificationsuccess': 11,
'otp': 3,
'personal': 5,
'promotion': 12,
'reminder': 13}
目的是,我想在训练张量流时使用这些标签。如果标签编码以1开头,则会出现错误:Received a label value of 13 which is outside the valid range of [0, 13)
这是模型定义。现在使事情起作用,我在上一层的总类中添加了+1:
model = keras.Sequential([
keras.layers.Embedding(input_dim=max_words, output_dim=64, input_length=input_dim),
keras.layers.Bidirectional(keras.layers.LSTM(64)), #, return_sequences=True
keras.layers.Dense(y_train.shape[1]+1, activation="softmax")])
答案 0 :(得分:0)
今天早晨,我遇到了与您完全相同的变量符号...; D
马可·塞里亚尼(Marco Cerliani)正确无误-引用他的话,只需这样做:
training_label_seq = np.array(label_tokenizer.texts_to_sequences(train_labels)) - 1
validation_label_seq = np.array(label_tokenizer.texts_to_sequences(validation_labels)) - 1
实际上,您需要保持所需的类数-因此请确保删除+1
:
model = keras.Sequential([
keras.layers.Embedding(input_dim=max_words, output_dim=64, input_length=input_dim),
keras.layers.Bidirectional(keras.layers.LSTM(64)), #, return_sequences=True
keras.layers.Dense(y_train.shape[1], activation="softmax")])