tf.texts_to_sequences从0开始

时间:2020-07-13 19:48:40

标签: tensorflow keras

我正在使用下面的代码使用Encoder进行标签编码。它可以编码,但是以1而不是0开头。如何使它从0开始编码?

label_tokenizer = Tokenizer()
label_tokenizer.fit_on_texts(labels)

training_label_seq = np.array(label_tokenizer.texts_to_sequences(train_labels))
validation_label_seq = np.array(label_tokenizer.texts_to_sequences(validation_labels))

下面的代码表明它从1开始:

label_tokenizer.word_index

{'credit': 10,
 'deduction': 9,
 'notification': 6,
 'notificationcredit': 4,
 'notificationfailed': 8,
 'notificationfinancial': 1,
 'notificationimportant': 2,
 'notificationreminder': 7,
 'notificationsuccess': 11,
 'otp': 3,
 'personal': 5,
 'promotion': 12,
 'reminder': 13}

目的是,我想在训练张量流时使用这些标签。如果标签编码以1开头,则会出现错误:Received a label value of 13 which is outside the valid range of [0, 13)
这是模型定义。现在使事情起作用,我在上一层的总类中添加了+1:

model = keras.Sequential([
  keras.layers.Embedding(input_dim=max_words, output_dim=64, input_length=input_dim),
  keras.layers.Bidirectional(keras.layers.LSTM(64)), #, return_sequences=True
  keras.layers.Dense(y_train.shape[1]+1, activation="softmax")])

1 个答案:

答案 0 :(得分:0)

今天早晨,我遇到了与您完全相同的变量符号...; D

马可·塞里亚尼(Marco Cerliani)正确无误-引用他的话,只需这样做:

training_label_seq = np.array(label_tokenizer.texts_to_sequences(train_labels)) - 1
validation_label_seq = np.array(label_tokenizer.texts_to_sequences(validation_labels)) - 1

实际上,您需要保持所需的类数-因此请确保删除+1

model = keras.Sequential([
  keras.layers.Embedding(input_dim=max_words, output_dim=64, input_length=input_dim),
  keras.layers.Bidirectional(keras.layers.LSTM(64)), #, return_sequences=True
  keras.layers.Dense(y_train.shape[1], activation="softmax")])