我正在分析500条推文,以便在R中创建一个wordcloud。我使用tweepy来检索特定主题的推文,并将它们放入.csv文件中,然后我在下面发布的这个R程序接收。
library(tm)
library(stringr)
library(wordcloud)
tweets <- read.csv("./tweets.csv", stringsAsFactors = FALSE)
nohandles <- str_replace_all(tweets$text, "@\\w+", "")
wordCorpus <- Corpus(VectorSource(nohandles))
wordCorpus <- tm_map(wordCorpus, removePunctuation)
wordCorpus <- tm_map(wordCorpus, content_transformer(tolower))
wordCorpus <- tm_map(wordCorpus, wordLengths = c(0,Inf))
wordCorpus <- tm_map(wordCorpus, removeWords, stopwords("english"))
wordCorpus <- tm_map(wordCorpus, removeWords, c("amp", "2yo", "3yo", "4yo"))
wordCorpus <- tm_map(wordCorpus, stripWhitespace)
pal <- brewer.pal(9,"YlGnBu")
pal <- pal[-(1:4)]
set.seed(123)
wordcloud(words = wordCorpus, scale=c(5,0.1), max.words=100, random.order=FALSE,
rot.per=0.35, use.r.layout=FALSE, colors=pal)
tdm <- TermDocumentMatrix(wordCorpus)
tdm
这就是终端的结果:
Loading required package: NLP
Loading required package: methods
Loading required package: RColorBrewer
Error in match.fun(FUN) : argument "FUN" is missing, with no default
Calls: tm_map -> tm_map.VCorpus -> mclapply -> lapply -> match.fun
Execution halted
这是由于R脚本中的这一行有助于获取大数据: wordCorpus&lt; - tm_map(wordCorpus,wordLengths = c(0,Inf))
现在这个脚本适用于任何twitter存档,但不适用于我使用Python制作的csv文件[下面发布的python代码](从twitter获取数据,我没有在这里发布api代码但是不言而喻,工作正常,否则将不会创建带有推文的csv文件)
csvFile = open('tweets.csv', 'a')
csvWriter = csv.writer(csvFile)
for tweet in tweepy.Cursor(api.search,
q="obama",
rpp=100,
result_type="recent",
include_entities=True,
lang="en").items(500):
print (tweet.text)
analysis = TextBlob(tweet.text)
print(analysis.sentiment)
print("")
csvWriter.writerow([tweet.text.encode('utf-8')])