Question

我正在使用下面的代码使用Encoder进行标签编码。它可以编码，但是以1而不是0开头。如何使它从0开始编码？

label_tokenizer = Tokenizer()
label_tokenizer.fit_on_texts(labels)

training_label_seq = np.array(label_tokenizer.texts_to_sequences(train_labels))
validation_label_seq = np.array(label_tokenizer.texts_to_sequences(validation_labels))

下面的代码表明它从1开始：

label_tokenizer.word_index

{'credit': 10,
 'deduction': 9,
 'notification': 6,
 'notificationcredit': 4,
 'notificationfailed': 8,
 'notificationfinancial': 1,
 'notificationimportant': 2,
 'notificationreminder': 7,
 'notificationsuccess': 11,
 'otp': 3,
 'personal': 5,
 'promotion': 12,
 'reminder': 13}

目的是，我想在训练张量流时使用这些标签。如果标签编码以1开头，则会出现错误：Received a label value of 13 which is outside the valid range of [0, 13)
这是模型定义。现在使事情起作用，我在上一层的总类中添加了+1：

model = keras.Sequential([
  keras.layers.Embedding(input_dim=max_words, output_dim=64, input_length=input_dim),
  keras.layers.Bidirectional(keras.layers.LSTM(64)), #, return_sequences=True
  keras.layers.Dense(y_train.shape[1]+1, activation="softmax")])

Answer 1

今天早晨，我遇到了与您完全相同的变量符号...; D

马可·塞里亚尼（Marco Cerliani）正确无误-引用他的话，只需这样做：

training_label_seq = np.array(label_tokenizer.texts_to_sequences(train_labels)) - 1
validation_label_seq = np.array(label_tokenizer.texts_to_sequences(validation_labels)) - 1

实际上，您需要保持所需的类数-因此请确保删除+1：

model = keras.Sequential([
  keras.layers.Embedding(input_dim=max_words, output_dim=64, input_length=input_dim),
  keras.layers.Bidirectional(keras.layers.LSTM(64)), #, return_sequences=True
  keras.layers.Dense(y_train.shape[1], activation="softmax")])

tf.texts_to_sequences从0开始

1 个答案: