中的CSV文件中的文字数据如下review1 - "the gps does not work",
review2 - "tracking of phone is inconsistent",
review3 - "the battery is draining fast",
review4 - "the tracks disappear after some time",
review5 - "the app consumes the battery lot because of gps"
现在我想提取每个评论中提到的功能,例如 “gps”,“跟踪”,“电池”,“曲目”,“电池gps”,并将其作为标签分别添加到CSV文件中;因此,CSV文件中会再创建一列作为“功能”。 因此,我的CSV将有2列,一个评论和一个功能列,将突出显示评论中提到的功能.CSV中数据的快照如下new csv file data
#Feature Prediction
texts <- c("the gps does not work",
"tracking of phone is inconsistent",
"the battery is draining fast",
"the tracks disappear after some time",
"the app consumes the battery a lot")
features <- c("gps", "tracking", "battery", "tracks","battery")
docs <- VCorpus(VectorSource(texts))
# Clean corpus
docs <- tm_map(docs, content_transformer(tolower))
docs <- tm_map(docs, removeNumbers)
docs <- tm_map(docs, removeWords, stopwords("english"))
docs <- tm_map(docs, removePunctuation)
docs <- tm_map(docs, stripWhitespace)
dtm <- DocumentTermMatrix(docs)
# Transform dtm to matrix to data frame - df is easier to work with
mat.df <- as.data.frame(data.matrix(dtm), stringsAsfactors = FALSE)
# Column bind category (known classification)
mat.df <- cbind(mat.df, features)
# Split data by rownumber into two equal portions (Train and Test Data)
train <- sample(nrow(mat.df), ceiling(nrow(mat.df) * .50))
test <- (1:nrow(mat.df))[- train]
# Isolate classifier
cl <- mat.df[, "features"]
# Create model data and remove "features"
modeldata <- mat.df[,!colnames(mat.df) %in% "features"]
feature_pred <- naiveBayes(modeldata[train,], cl[train])
naiv_pred <- predict(feature_pred, modeldata[test,])
conf.mat <- table("Predictions" = naiv_pred, Actual = cl[test])
(accuracy <- sum(diag(conf.mat))/length(test) * 100)