Keras在文字上的表现不佳

时间:2019-01-24 02:00:15

标签: r keras

因此,我有一个数据集,其中包含来自社交媒体的30000条评论,这些评论已分为三类(批评者,中立者和促进者)。为了构建模型,我遵循了该站点中的说明:https://www.r-bloggers.com/how-to-prepare-data-for-nlp-text-classification-with-keras-and-tensorflow/

最后,我得到了这样的东西:

 indice <- sample(nrow(total), replace = FALSE)
sample <- total[indice,]
n <- 1:round(nrow(sample)*0.8,0)


df.train <- sample[n,]
df.test <- sample[-n,]

df.train <- mutate(df.train, text = paste(`Monitoramento`, `Texto do Comentário`))


text <- df.train$text
max_features <- 8000
tokenizer <- text_tokenizer(num_words = max_features) 

tokenizer %>% fit_text_tokenizer(text) 

text_seqs <- texts_to_sequences(tokenizer, text) 

maxlen <- 100
batch_size <- 32 
embedding_dims <- 50 
filters <- 64
kernel_size <- 3 
hidden_dims <- 50 
epochs <- 5 


x_train <- text_seqs %>% pad_sequences(maxlen = maxlen)
y_train <- as.factor(df.train$`Sentimento do NPS`)

model <- keras_model_sequential() %>% 
  layer_embedding(max_features, embedding_dims, input_length = maxlen) %>%
  layer_dropout(0.2) %>%
  layer_conv_1d( filters, kernel_size, padding = "valid", activation = "relu", strides = 1 ) %>%
  layer_global_max_pooling_1d() %>%
  layer_dense(hidden_dims) %>%
  layer_dropout(0.2) %>% layer_activation("relu") %>%
  layer_dense(1) %>%
  layer_activation("sigmoid") %>%
  compile( loss = "binary_crossentropy", optimizer = "adam", metrics = "accuracy" ) 


hist <- model %>%
  fit(x_train, as.numeric(y_train), batch_size = batch_size, epochs = epochs, validation_split = 0.1)

问题在于模型确实表现不佳。使用朴素贝叶斯,我获得了80%的准确性,但是在Keras中,我几乎没有达到10%的准确性。预处理或模型构建中的某些内容可能是错误的。有人可以识别出那是什么吗?

0 个答案:

没有答案