Question

我一直在从事kaggle比赛。但是我对模型的指责很少帮助我

我已经尝试过lstm模型，但是准确性仍然较低。我已经对数据集进行了热编码，然后再对其进行文本排序

代币

tok = Tokenizer(num_words=max_features,filters='!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~',lower=True)
tok.fit_on_texts(list(X_train) + list(X_test))
x_train = tok.texts_to_sequences(X_train)
x_test = tok.texts_to_sequences(X_test)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')
word_index=tok.word_index
word_index['PAD']=0
word_index['START']=1
word_index['UNX']=2
print('found %s unique tokens' %len(tok.word_index))
print('Average train sequence length: {}'.format(np.mean(list(map(len, x_train)), dtype=int)))
print('Average test sequence length: {}'.format(np.mean(list(map(len, x_test)), dtype=int)))

x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

模型

model = Sequential()
model.add(Embedding(10000,128,input_length=9))
model.add(SpatialDropout1D(0.4))
model.add(Bidirectional(LSTM(128,dropout=0.4,recurrent_dropout=0.3)))
model.add(Dense(128, input_dim=9, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(5, activation="softmax"))
tensorboard= TensorBoard(log_dir="log\{}".format(NAME))
model.compile(optimizer="adam",loss="categorical_crossentropy", metrics=["accuracy"])

当一个标签非常多时，精度极低，最大为0.51

多类标签的准确性非常低吗？

0 个答案: