在情绪分析中产生错误信息

时间:2014-04-16 22:59:34

标签: r sentiment-analysis stemming

我确实在我的数据集中进行了情绪分析,并收到了此错误消息

“结构错误(if(length(n))n else NA,names = x):   'names'属性[2]的长度必须与vector [1]“

的长度相同

请帮忙!

myCorpus<-Corpus(VectorSource(Datasetlow_cost_airline$text))
# Convert to lower case
myCorpus<-tm_map(myCorpus,tolower)
# Remove puntuation
myCorpus<-tm_map(myCorpus,removePunctuation)
# Remove numbers
myCorpus<-tm_map(myCorpus,removeNumbers)
# Remove URLs ?regex = regular expression ?gsub = pattern matching
removeURL<-function(x)gsub("http[[:alnum:]]*","",x)
myCorpus<-tm_map(myCorpus,removeURL)
stopwords("english")
# Add two extra stop words: 'available' and 'via'
myStopwords<-c(stopwords("english"),"available","via","can")
# Remove stopwords from corpus
myCorpus<-tm_map(myCorpus,removeWords,myStopwords)
# Keep a copy of corpus to use later as a dictionary for stem completion
myCorpusCopy<-myCorpus
# Stem word (change all the words to its root word)
myCorpus<-tm_map(myCorpus,stemDocument)
# Inspect documents (tweets) numbered 11 to 15
for(i in 11:15){
cat(paste("[[",i,"]]",sep=""))
writeLines(strwrap(myCorpus[[i]],width=73))
}
# Stem completion
myCorpus<-tm_map(myCorpus,stemCompletion,dictionary=myCorpusCopy)

1 个答案:

答案 0 :(得分:1)

stemCompletion版本0.6中的tm函数似乎有些奇怪。我用here有一个很好的解决方法this answer。简而言之,替换你的

# Stem completion
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary = myCorpusCopy) # use spaces!

# Stem completion
stemCompletion_mod <- function(x,dict) {
  PlainTextDocument(stripWhitespace(paste(stemCompletion(unlist(strsplit(as.character(x)," ")), dictionary = dict, type = "shortest"), sep = "", collapse = " ")))
}
# apply workaround function 
myCorpus <- lapply(corpus, stemCompletion_mod, myCorpusCopy)

如果这没有帮助,那么您需要提供更多详细信息和实际数据样本。