我的数据集如下所示,我跟着Classification using Naive Bayes tutorial开发了Naive bayes
模型进行文本化但是,即使建立了模型,我也无法预测naive bayes
的结果。 predict
函数返回0因子级别。下面是我目前的数据集和代码。
**Dataset:**
lie sentiment review
f n 'Mike\'s Pizza High Point NY Service was very slow and the quality was low. You would think they would know at least how to make good pizza not. Stick to pre-made dishes like stuffed pasta or a salad. You should consider dining else where.'
f n 'i really like this buffet restaurant in Marshall street. they have a lot of selection of american japanese and chinese dishes. we also got a free drink and free refill. there are also different kinds of dessert. the staff is very friendly. it is also quite cheap compared with the other restaurant in syracuse area. i will definitely coming back here.'
f n 'After I went shopping with some of my friend we went to DODO restaurant for dinner. I found worm in one of the dishes .'
f n 'Olive Oil Garden was very disappointing. I expect good food and good service (at least!!) when I go out to eat. The meal was cold when we got it and the waitor had no manners whatsoever. Don\'t go to the Olive Oil Garden. '
f n 'The Seven Heaven restaurant was never known for a superior service but what we experienced last week was a disaster. The waiter would not notice us until we asked him 4 times to bring us the menu. The food was not exceptional either. It took them though 2 minutes to bring us a check after they spotted we finished eating and are not ordering more. Well never more. '
f n 'I went to XYZ restaurant and had a terrible experience. I had a YELP Free Appetizer coupon which could be applied upon checking in to the restaurant. The person serving us was very rude and didn\'t acknowledge the coupon. When I asked her about it she rudely replied back saying she had already applied it. Then I inquired about the free salad that they serve. She rudely said that you have to order the main course to get that. Overall I had a bad experience as I had taken my family to that restaurant for the first time and I had high hopes from the restaurant which is otherwise my favorite place to dine. '
f n 'I went to ABC restaurant two days ago and I hated the food and the service. We were kept waiting for over an hour just to get seated and once we ordered our food came out cold. I ordered the pasta and it was terrible - completely bland and very unappatizing. I definitely would not recommend going there especially if you\'re in a hurry!'
f n 'I went to the Chilis on Erie Blvd and had the worst meal of my life. We arrived and waited 5 minutes for a hostess and then were seated by a waiter who was obviously in a terrible mood. We order drinks and it took them 15 minutes to bring us both the wrong beers which were barely cold. Then we order an appetizer and wait 25 minutes for cold southwest egg rolls at which point we just paid and left. Don\'t go.'
f n 'OMG. This restaurant is horrible. The receptionist did not greet us we just stood there and waited for five minutes. The food came late and served not warm. Me and my pet ordered a bowl of salad and a cheese pizza. The salad was not fresh the crust of a pizza was so hard like plastics. My dog didn\'t even eat that pizza. I hate this place!!!!!!!!!!'
dput(DF)
> dput(head(lie))
structure(list(lie = c("f", "f", "f", "f", "f", "f"), sentiment = c("n",
"n", "n", "n", "n", "n"), review = c("Mike\\'s Pizza High Point, NY Service was very slow and the quality was low. You would think they would know at least how to make good pizza, not. Stick to pre-made dishes like stuffed pasta or a salad. You should consider dining else where.",
"i really like this buffet restaurant in Marshall street. they have a lot of selection of american, japanese, and chinese dishes. we also got a free drink and free refill. there are also different kinds of dessert. the staff is very friendly. it is also quite cheap compared with the other restaurant in syracuse area. i will definitely coming back here.",
"After I went shopping with some of my friend, we went to DODO restaurant for dinner. I found worm in one of the dishes .",
"Olive Oil Garden was very disappointing. I expect good food and good service (at least!!) when I go out to eat. The meal was cold when we got it, and the waitor had no manners whatsoever. Don\\'t go to the Olive Oil Garden. ",
"The Seven Heaven restaurant was never known for a superior service but what we experienced last week was a disaster. The waiter would not notice us until we asked him 4 times to bring us the menu. The food was not exceptional either. It took them though 2 minutes to bring us a check after they spotted we finished eating and are not ordering more. Well, never more. ",
"I went to XYZ restaurant and had a terrible experience. I had a YELP Free Appetizer coupon which could be applied upon checking in to the restaurant. The person serving us was very rude and didn\\'t acknowledge the coupon. When I asked her about it, she rudely replied back saying she had already applied it. Then I inquired about the free salad that they serve. She rudely said that you have to order the main course to get that. Overall, I had a bad experience as I had taken my family to that restaurant for the first time and I had high hopes from the restaurant which is, otherwise, my favorite place to dine. "
)), .Names = c("lie", "sentiment", "review"), class = c("data.table",
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x0000000000180788>)
R代码:
library(gmodels)
lie<- fread('deception.csv',header = T,fill = T,quote = "\'")
str(lie)
lie
#Corpus Building
words.vec<- VectorSource(lie$review)
words.corpus<- Corpus(words.vec)
words.corpus<-tm_map(words.corpus,content_transformer(tolower)) #lower case
words.corpus<-tm_map(words.corpus,removePunctuation) # remove punctuation
words.corpus<-tm_map(words.corpus,removeNumbers) # remove numbers
words.corpus<-tm_map(words.corpus,removeWords,stopwords('english')) # remove stopwords
words.corpus<-tm_map(words.corpus,stripWhitespace) # remove unnecessary whitespace
#==========================================================================
#Document term Matrix
dtm<-DocumentTermMatrix(words.corpus)
dtm
class(dtm)
#dtm_df<-as.data.frame(as.matrix(dtm))
#class(dtm_df)
freq <- colSums(as.matrix(dtm))
length(freq)
ord <- order(freq,decreasing=TRUE)
freq[head(ord)]
freq[tail(ord)]
#===========================================================================
#Data frame partition
#Splitting DTM
dtm_train <- dtm[1:61, ]
dtm_test <- dtm[62:92, ]
train_labels <- lie[1:61, ]$lie
test_labels <-lie[62:92, ]$lie
str(train_labels)
str(test_labels)
prop.table(table(train_labels))
prop.table(table(test_labels))
freq_words <- findFreqTerms(dtm_train, 10)
freq_words
dtm_freq_train<- dtm_train[ , freq_words]
dtm_freq_test <- dtm_test[ , freq_words]
dtm_freq_test
convert_counts <- function(x) {
x <- ifelse(x > 0, 'yes','No')
}
train <- apply(dtm_freq_train, MARGIN = 2, convert_counts)
test <- apply(dtm_freq_test, MARGIN = 2, convert_counts)
str(test)
nb_classifier<-naiveBayes(train,train_labels)
nb_classifier
test_pred<-predict(nb_classifier,test)
提前感谢您的帮助,
答案 0 :(得分:1)
Naive Bayes 需要响应变量作为分类类变量:
将lie
数据框的lie
列转换为factor
并重新运行分析:
lie$lie <- as.factor(lie$lie)