我在R中有一个30个Cristiano Ronaldo推文的数据集。但是,当我尝试通过tm包清理数据时,我只将元数据作为输出而不是清理版本的文本。这是我的代码:
##cleaning, analyse and display of data
#Libraries necessary to clean, analyse and display data
library(tm)
# Create a DataframeSource of Ronaldo_tweets
Ronaldo_source <- DataframeSource(Ronaldo_tweets)
# Convert Ronaldo_source to a corpus
Ronaldo_corpus <- VCorpus(Ronaldo_source)
# Function to clean corpus
clean_corpus <- function(corpus){
tm_map(corpus, stripWhitespace)
tm_map(corpus, removePunctuation)
tm_map(corpus, removeNumbers)
return(corpus)
}
# Apply customized function to Ronaldo_corpus
Ronaldo_clean <- clean_corpus(Ronaldo_corpus)
#Print Ronaldo_clean
Ronaldo_clean
content(Ronaldo_clean)