在这个程序中,我试图读取特定用户的n条推文并在处理后显示他的推文数据,但问题是当我指定推文数量为10然后它运行良好..
代码snipet
#Tweet processing
library("twitteR")
library("tm")
tweets_process<-function(){
tweets<-userTimeline("roypartha97",n=100)
tweets.df<-twListToDF(tweets)
mycorpus<-Corpus(VectorSource(tweets.df$text))
mycorpus<-tm_map(mycorpus,content_transformer(tolower))
mycorpus<-tm_map(mycorpus,removePunctuation)
mycorpus<-tm_map(mycorpus,removeNumbers)
removeUrl<-function(x) gsub("http[:alnum:]*","",x)
mycorpus<-tm_map(mycorpus,removeUrl)
mycorpus<-tm_map(mycorpus,removeWords,stopwords("english"))
mycorpusCopy<-mycorpus
mycorpus<-tm_map(mycorpus,stemDocument,language="english",lazy=TRUE)
for(i in 1:5)
{
cat(paste("[",i,"]",sep=""))
writeLines(mycorpus[[i]])
}
#mycorpus<-tm_map(mycorpus,stemCompletion,dictionary=mycorpusCopy,lazy=TRUE)
#tdm<-TermDocumentMatrix(mycorpus,control=list(wordLengths=c(1,Inf)))
#print(tdm)
}
但是,当我将推文数从10改为100时,会出现这些问题 -
[1]Error in UseMethod("stemDocument", x) :
no applicable method for 'stemDocument' applied to an object of class "try-error"
In addition: Warning messages:
1: In mclapply(content(x), FUN, ...) :
scheduled core 1 encountered error in user code, all values of the job will be affected
2: In mclapply(content(x), FUN, ...) :
scheduled core 1 encountered error in user code, all values of the job will be affected
3: In mclapply(content(x), FUN, ...) :
scheduled core 1 encountered error in user code, all values of the job will be affected
4: In mclapply(content(x), FUN, ...) :
scheduled core 1 encountered error in user code, all values of the job will be affected
>
答案 0 :(得分:1)
经过大量的尝试,我在DocumentTermMatrix创建步骤中完成了,我指定了清理过程并且工作正常。
这就是我用的 -
tdm = TermDocumentMatrix(mycorpus,control=list(removepunctuation=TRUE,stopwords=c(stopwords("english"),customstopwords),removeNumbers=TRUE,tolower=TRUE))