朴素贝叶斯模型不预测应用模型的任何内容 - 预测函数以0因子水平返回

时间:2018-06-16 22:57:03

标签: r text-mining naivebayes

我的数据集如下所示,我跟着Classification using Naive Bayes tutorial开发了Naive bayes模型进行文本化但是,即使建立了模型,我也无法预测naive bayes的结果。 predict函数返回0因子级别。下面是我目前的数据集和代码。

**Dataset:**
lie sentiment   review                                                                                  
f   n   'Mike\'s Pizza High Point    NY Service was very slow and the quality was low. You would think they would know at least how to make good pizza   not. Stick to pre-made dishes like stuffed pasta or a salad. You should consider dining else where.'                                                                           
f   n   'i really like this buffet restaurant in Marshall street. they have a lot of selection of american   japanese    and chinese dishes. we also got a free drink and free refill. there are also different kinds of dessert. the staff is very friendly. it is also quite cheap compared with the other restaurant in syracuse area. i will definitely coming back here.'                                                                          
f   n   'After I went shopping with some of my friend    we went to DODO restaurant for dinner. I found worm in one of the dishes .'                                                                                
f   n   'Olive Oil Garden was very disappointing. I expect good food and good service (at least!!) when I go out to eat. The meal was cold when we got it    and the waitor had no manners whatsoever. Don\'t go to the Olive Oil Garden. '                                                                             
f   n   'The Seven Heaven restaurant was never known for a superior service but what we experienced last week was a disaster. The waiter would not notice us until we asked him 4 times to bring us the menu. The food was not exceptional either. It took them though 2 minutes to bring us a check after they spotted we finished eating and are not ordering more. Well   never more. '                                                                              
f   n   'I went to XYZ restaurant and had a terrible experience. I had a YELP Free Appetizer coupon which could be applied upon checking in to the restaurant. The person serving us was very rude and didn\'t acknowledge the coupon. When I asked her about it     she rudely replied back saying she had already applied it. Then I inquired about the free salad that they serve. She rudely said that you have to order the main course to get that. Overall    I had a bad experience as I had taken my family to that restaurant for the first time and I had high hopes from the restaurant which is     otherwise   my favorite place to dine. '                                                                   
f   n   'I went to ABC restaurant two days ago and I hated the food and the service. We were kept waiting for over an hour just to get seated and once we ordered    our food came out cold. I ordered the pasta and it was terrible - completely bland and very unappatizing. I definitely would not recommend going there  especially if you\'re in a hurry!'                                                                         
f   n   'I went to the Chilis on Erie Blvd and had the worst meal of my life. We arrived and waited 5 minutes for a hostess  and then were seated by a waiter who was obviously in a terrible mood. We order drinks and it took them 15 minutes to bring us both the wrong beers which were barely cold. Then we order an appetizer and wait 25 minutes for cold southwest egg rolls     at which point we just paid and left. Don\'t go.'                                                                          
f   n   'OMG. This restaurant is horrible. The receptionist did not greet us     we just stood there and waited for five minutes. The food came late and served not warm. Me and my pet ordered a bowl of salad and a cheese pizza. The salad was not fresh  the crust of a pizza was so hard like plastics. My dog didn\'t even eat that pizza. I hate this place!!!!!!!!!!'   

dput(DF)

> dput(head(lie))
structure(list(lie = c("f", "f", "f", "f", "f", "f"), sentiment = c("n", 
"n", "n", "n", "n", "n"), review = c("Mike\\'s Pizza High Point, NY Service was very slow and the quality was low. You would think they would know at least how to make good pizza, not. Stick to pre-made dishes like stuffed pasta or a salad. You should consider dining else where.", 
"i really like this buffet restaurant in Marshall street. they have a lot of selection of american, japanese, and chinese dishes. we also got a free drink and free refill. there are also different kinds of dessert. the staff is very friendly. it is also quite cheap compared with the other restaurant in syracuse area. i will definitely coming back here.", 
"After I went shopping with some of my friend, we went to DODO restaurant for dinner. I found worm in one of the dishes .", 
"Olive Oil Garden was very disappointing. I expect good food and good service (at least!!) when I go out to eat. The meal was cold when we got it, and the waitor had no manners whatsoever. Don\\'t go to the Olive Oil Garden. ", 
"The Seven Heaven restaurant was never known for a superior service but what we experienced last week was a disaster. The waiter would not notice us until we asked him 4 times to bring us the menu. The food was not exceptional either. It took them though 2 minutes to bring us a check after they spotted we finished eating and are not ordering more. Well, never more. ", 
"I went to XYZ restaurant and had a terrible experience. I had a YELP Free Appetizer coupon which could be applied upon checking in to the restaurant. The person serving us was very rude and didn\\'t acknowledge the coupon. When I asked her about it, she rudely replied back saying she had already applied it. Then I inquired about the free salad that they serve. She rudely said that you have to order the main course to get that. Overall, I had a bad experience as I had taken my family to that restaurant for the first time and I had high hopes from the restaurant which is, otherwise, my favorite place to dine. "
)), .Names = c("lie", "sentiment", "review"), class = c("data.table", 
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x0000000000180788>)

R代码:

library(gmodels)

lie<- fread('deception.csv',header = T,fill = T,quote = "\'")
str(lie)
lie
#Corpus Building
words.vec<- VectorSource(lie$review)
words.corpus<- Corpus(words.vec)
words.corpus<-tm_map(words.corpus,content_transformer(tolower)) #lower case
words.corpus<-tm_map(words.corpus,removePunctuation) # remove punctuation
words.corpus<-tm_map(words.corpus,removeNumbers) # remove numbers
words.corpus<-tm_map(words.corpus,removeWords,stopwords('english')) # remove stopwords
words.corpus<-tm_map(words.corpus,stripWhitespace) # remove unnecessary whitespace

#==========================================================================
#Document term Matrix
dtm<-DocumentTermMatrix(words.corpus)
dtm
class(dtm)

#dtm_df<-as.data.frame(as.matrix(dtm))
#class(dtm_df)

freq <- colSums(as.matrix(dtm))
length(freq)
ord <- order(freq,decreasing=TRUE)
freq[head(ord)]
freq[tail(ord)]

#===========================================================================
#Data frame partition
#Splitting DTM

dtm_train <- dtm[1:61, ]
dtm_test <- dtm[62:92, ]

train_labels <- lie[1:61, ]$lie
test_labels <-lie[62:92, ]$lie

str(train_labels)
str(test_labels)

prop.table(table(train_labels))
prop.table(table(test_labels))


freq_words <- findFreqTerms(dtm_train, 10)
freq_words
dtm_freq_train<- dtm_train[ , freq_words]
dtm_freq_test <- dtm_test[ , freq_words]
dtm_freq_test


convert_counts <- function(x) {
  x <- ifelse(x > 0, 'yes','No')
}

train <- apply(dtm_freq_train, MARGIN = 2, convert_counts)
test <- apply(dtm_freq_test, MARGIN = 2, convert_counts)
str(test)


nb_classifier<-naiveBayes(train,train_labels)
nb_classifier

test_pred<-predict(nb_classifier,test)

提前感谢您的帮助,

1 个答案:

答案 0 :(得分:1)

Naive Bayes 需要响应变量作为分类类变量: 将lie数据框的lie列转换为factor并重新运行分析:

lie$lie <- as.factor(lie$lie)