Question

因此，我有一个数据集，其中包含来自社交媒体的30000条评论，这些评论已分为三类（批评者，中立者和促进者）。为了构建模型，我遵循了该站点中的说明：https://www.r-bloggers.com/how-to-prepare-data-for-nlp-text-classification-with-keras-and-tensorflow/

最后，我得到了这样的东西：

 indice <- sample(nrow(total), replace = FALSE)
sample <- total[indice,]
n <- 1:round(nrow(sample)*0.8,0)


df.train <- sample[n,]
df.test <- sample[-n,]

df.train <- mutate(df.train, text = paste(`Monitoramento`, `Texto do Comentário`))


text <- df.train$text
max_features <- 8000
tokenizer <- text_tokenizer(num_words = max_features) 

tokenizer %>% fit_text_tokenizer(text) 

text_seqs <- texts_to_sequences(tokenizer, text) 

maxlen <- 100
batch_size <- 32 
embedding_dims <- 50 
filters <- 64
kernel_size <- 3 
hidden_dims <- 50 
epochs <- 5 


x_train <- text_seqs %>% pad_sequences(maxlen = maxlen)
y_train <- as.factor(df.train$`Sentimento do NPS`)

model <- keras_model_sequential() %>% 
  layer_embedding(max_features, embedding_dims, input_length = maxlen) %>%
  layer_dropout(0.2) %>%
  layer_conv_1d( filters, kernel_size, padding = "valid", activation = "relu", strides = 1 ) %>%
  layer_global_max_pooling_1d() %>%
  layer_dense(hidden_dims) %>%
  layer_dropout(0.2) %>% layer_activation("relu") %>%
  layer_dense(1) %>%
  layer_activation("sigmoid") %>%
  compile( loss = "binary_crossentropy", optimizer = "adam", metrics = "accuracy" ) 


hist <- model %>%
  fit(x_train, as.numeric(y_train), batch_size = batch_size, epochs = epochs, validation_split = 0.1)

问题在于模型确实表现不佳。使用朴素贝叶斯，我获得了80％的准确性，但是在Keras中，我几乎没有达到10％的准确性。预处理或模型构建中的某些内容可能是错误的。有人可以识别出那是什么吗？

Keras在文字上的表现不佳

0 个答案: