我正在尝试使用word2vec和多项逻辑回归进行情感分析系统;
下面是R代码:
library(tidyverse)
library(text2vec)
library(caret)
library(glmnet)
library(ggrepel)
Train_classifier <- read.csv('IRC2.csv',header=T, sep=";")
# select only 4 column of the dataframe
Train <- Train_classifier[, c("Note.Reco", "Raison.Reco", "DATE_SAISIE", "idpart")]
#delete rows with empty value columns
subTrain <- Train[rowSums(Train == '') == 0,]
subTrain$ID <- seq.int(nrow(subTrain))
# # replacing class values
subTrain$Note.Reco = ifelse(subTrain$Note.Reco >= 0 & subTrain$Note.Reco <= 4, 0, ifelse(subTrain$Note.Reco >= 5 &
subTrain$Note.Reco <= 6, 1, ifelse(subTrain$Note.Reco >= 7 & subTrain$Note.Reco <= 8, 2, 3)))
#Data pre processing
#Doc2Vec
prep_fun <- tolower
tok_fun <- word_tokenizer
subTrain[] <- lapply(subTrain, as.character)
reason <- subTrain[['Raison.Reco']][1:10]
it_train <- itoken(subTrain$Raison.Reco,
preprocessor = prep_fun,
tokenizer = tok_fun,
ids = subTrain$ID,
progressbar = TRUE)
#creation of vocabulary and term document matrix
### fichier d'apprentissage
vocab <- create_vocabulary(it_train)
vectorizer <- vocab_vectorizer(vocab)
dtm_train <- create_dtm(it_train, vectorizer)
##Define tf-idf model
tfidf <- TfIdf$new()
# fit the model to the train data and transform it with the fitted model
dtm_train_tfidf <- fit_transform(dtm_train, tfidf)
glmnet_classifier <- cv.glmnet(x = dtm_train_tfidf,
y = subTrain[['Note.Reco']], family = 'multinomial',type.multinomial = "grouped")
#plot(glmnet_classifier)
这里是数据帧SubTrain的结构:
Note.Reco Raison.Reco DATE_SAISIE idpart ID
2 3 Toujours bien suivi par mon conseiller 19/03/2014 102853645 1
3 2 Bon accueil 19/03/2014 1072309 2
4 3 je suis satisfaite 19/03/2014 191391 3
6 1 satisfait !! 19/03/2014 14529 4
7 3 satisfait de ma conseillère 19/03/2014 100065501 5
但是当我运行此代码时,我收到此错误:
> glmnet_classifier <- cv.glmnet(x = dtm_train_tfidf,
+ y = subTrain[['Note.Reco']], family = 'multinomial',type.multinomial = "grouped")
Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, :
one multinomial or binomial class has 1 or 0 observations; not allowed
正如你在数据框中看到的那样,我没有0和1值的列,所以我不知道为什么会出现这个错误。
你可以帮我解决这个问题吗?谢谢