这是一个虚拟文本:
df$text <- c("This is just a text in order to test the term frequency matrix save result process. I would like to save all results after the term frequency process into one dataframe...")
文本挖掘过程
library(tm)
corpusD <- Corpus(VectorSource(df$text))
myStopwords <- c("would", "e g")
corpusD <- tm_map(corpusD, tolower)
corpusD <- tm_map(corpusD, removeWords, stopwords('english'))
corpusD <- tm_map(corpusD, removeNumbers)
corpusD <- tm_map(corpusD, removeWords, myStopwords)
corpusD <- tm_map(corpusD, stripWhitespace)
matrixD <- TermDocumentMatrix(corpusD)
我想将最后一步TermDocumentMatrix的所有结果转换为如下数据框:
term frequency
frequency 2
matrix 1
但如果我尝试将结果保存到csv文件中,则只提供频率而不是术语。不知道怎么可能做到?
答案 0 :(得分:1)
您需要在保存之前创建data.frame
。
df1 <- c("This is just a text in order to test the term frequency matrix save result process. I would like to save all results after the term frequency process into one dataframe...")
library(tm)
corpusD <- Corpus(VectorSource(df1))
myStopwords <- c("would", "e g")
corpusD <- tm_map(corpusD, tolower)
corpusD <- tm_map(corpusD, removeWords, stopwords('english'))
corpusD <- tm_map(corpusD, removeNumbers)
corpusD <- tm_map(corpusD, removeWords, myStopwords)
corpusD <- tm_map(corpusD, stripWhitespace)
matrixD <- TermDocumentMatrix(corpusD)
res <- data.frame(term=rownames(as.matrix(matrixD)),frequency=rowSums(as.matrix(matrixD)))
row.names(res)<-NULL
write.csv(res,"c:/temp/tm.csv")
term frequency
1 dataframe 1
2 frequency 2
3 just 1
4 like 1
5 matrix 1
6 one 1
7 order 1
8 process 2
9 result 1
10 results 1
11 save 2
12 term 2
13 test 1
14 text 1