对从Twitter提取的数据使用tm包

时间:2017-08-16 19:05:11

标签: r twitter tm

我在R中有一个30个Cristiano Ronaldo推文的数据集。但是,当我尝试通过tm包清理数据时,我只将元数据作为输出而不是清理版本的文本。这是我的代码:

##cleaning, analyse and display of data
#Libraries necessary to clean, analyse and display data
library(tm)

# Create a DataframeSource of Ronaldo_tweets
Ronaldo_source <- DataframeSource(Ronaldo_tweets)

# Convert Ronaldo_source to a corpus 
Ronaldo_corpus <- VCorpus(Ronaldo_source)

# Function to clean corpus
clean_corpus <- function(corpus){
tm_map(corpus, stripWhitespace)
tm_map(corpus, removePunctuation)
tm_map(corpus, removeNumbers)
return(corpus)
}

# Apply customized function to Ronaldo_corpus
Ronaldo_clean <- clean_corpus(Ronaldo_corpus)

#Print Ronaldo_clean
Ronaldo_clean
content(Ronaldo_clean)

0 个答案:

没有答案