R:lognet中的错误(x,is.sparse,ix,jx,y,权重,偏移,alpha,nobs,:一个多项式或二项式有1或0个观察值;不允许

时间:2017-03-17 09:52:20

标签: r logistic-regression sentiment-analysis word2vec

我正在尝试使用word2vec和多项逻辑回归进行情感分析系统;

下面是R代码:

library(tidyverse)
library(text2vec)
library(caret)
library(glmnet)
library(ggrepel)

Train_classifier <- read.csv('IRC2.csv',header=T, sep=";")


# select only 4 column of the dataframe

Train <- Train_classifier[, c("Note.Reco", "Raison.Reco", "DATE_SAISIE", "idpart")]

#delete rows with empty value columns
subTrain <- Train[rowSums(Train == '') == 0,]
subTrain$ID <- seq.int(nrow(subTrain))

# # replacing class values
subTrain$Note.Reco = ifelse(subTrain$Note.Reco >= 0 & subTrain$Note.Reco <= 4, 0, ifelse(subTrain$Note.Reco >= 5 &
subTrain$Note.Reco <= 6, 1, ifelse(subTrain$Note.Reco >= 7 & subTrain$Note.Reco <= 8, 2, 3)))



#Data pre processing
#Doc2Vec

prep_fun <- tolower
tok_fun <- word_tokenizer

subTrain[] <- lapply(subTrain, as.character)

reason <- subTrain[['Raison.Reco']][1:10]


it_train <- itoken(subTrain$Raison.Reco, 
                   preprocessor = prep_fun, 
                   tokenizer = tok_fun,
                   ids = subTrain$ID,
                   progressbar = TRUE)





#creation of vocabulary and term document matrix
  ### fichier  d'apprentissage
vocab <- create_vocabulary(it_train)
vectorizer <- vocab_vectorizer(vocab)
dtm_train <- create_dtm(it_train, vectorizer)




##Define  tf-idf model 

tfidf <- TfIdf$new()
# fit the model to the train data and transform it with the fitted model
dtm_train_tfidf <- fit_transform(dtm_train, tfidf)


glmnet_classifier <- cv.glmnet(x = dtm_train_tfidf,
                               y = subTrain[['Note.Reco']], family = 'multinomial',type.multinomial = "grouped")


#plot(glmnet_classifier)

这里是数据帧SubTrain的结构:

Note.Reco Raison.Reco DATE_SAISIE idpart ID
2 3 Toujours bien suivi par mon conseiller 19/03/2014 102853645 1
3 2 Bon accueil 19/03/2014 1072309 2
4 3 je suis satisfaite 19/03/2014 191391 3
6 1 satisfait !! 19/03/2014 14529 4
7 3 satisfait de ma conseillère 19/03/2014 100065501 5

但是当我运行此代码时,我收到此错误:

> glmnet_classifier <- cv.glmnet(x = dtm_train_tfidf,
+                                y = subTrain[['Note.Reco']], family = 'multinomial',type.multinomial = "grouped")
Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs,  : 
  one multinomial or binomial class has 1 or 0 observations; not allowed

正如你在数据框中看到的那样,我没有0和1值的列,所以我不知道为什么会出现这个错误。

你可以帮我解决这个问题吗?

谢谢

0 个答案:

没有答案