因此,我有一个数据集,其中包含来自社交媒体的30000条评论,这些评论已分为三类(批评者,中立者和促进者)。为了构建模型,我遵循了该站点中的说明:https://www.r-bloggers.com/how-to-prepare-data-for-nlp-text-classification-with-keras-and-tensorflow/
最后,我得到了这样的东西:
indice <- sample(nrow(total), replace = FALSE)
sample <- total[indice,]
n <- 1:round(nrow(sample)*0.8,0)
df.train <- sample[n,]
df.test <- sample[-n,]
df.train <- mutate(df.train, text = paste(`Monitoramento`, `Texto do Comentário`))
text <- df.train$text
max_features <- 8000
tokenizer <- text_tokenizer(num_words = max_features)
tokenizer %>% fit_text_tokenizer(text)
text_seqs <- texts_to_sequences(tokenizer, text)
maxlen <- 100
batch_size <- 32
embedding_dims <- 50
filters <- 64
kernel_size <- 3
hidden_dims <- 50
epochs <- 5
x_train <- text_seqs %>% pad_sequences(maxlen = maxlen)
y_train <- as.factor(df.train$`Sentimento do NPS`)
model <- keras_model_sequential() %>%
layer_embedding(max_features, embedding_dims, input_length = maxlen) %>%
layer_dropout(0.2) %>%
layer_conv_1d( filters, kernel_size, padding = "valid", activation = "relu", strides = 1 ) %>%
layer_global_max_pooling_1d() %>%
layer_dense(hidden_dims) %>%
layer_dropout(0.2) %>% layer_activation("relu") %>%
layer_dense(1) %>%
layer_activation("sigmoid") %>%
compile( loss = "binary_crossentropy", optimizer = "adam", metrics = "accuracy" )
hist <- model %>%
fit(x_train, as.numeric(y_train), batch_size = batch_size, epochs = epochs, validation_split = 0.1)
问题在于模型确实表现不佳。使用朴素贝叶斯,我获得了80%的准确性,但是在Keras中,我几乎没有达到10%的准确性。预处理或模型构建中的某些内容可能是错误的。有人可以识别出那是什么吗?