保存术语频率文档矩阵

时间:2017-03-26 16:11:59

标签: r

这是一个虚拟文本:

df$text <- c("This is just a text in order to test the term frequency matrix save result process. I would like to save all results after the term frequency process into one dataframe...")

文本挖掘过程

library(tm)
corpusD <- Corpus(VectorSource(df$text))
myStopwords <- c("would", "e g")
corpusD <- tm_map(corpusD, tolower)
corpusD <- tm_map(corpusD, removeWords, stopwords('english'))
corpusD <- tm_map(corpusD, removeNumbers)
corpusD <- tm_map(corpusD, removeWords, myStopwords)
corpusD <- tm_map(corpusD, stripWhitespace)
matrixD <- TermDocumentMatrix(corpusD)

我想将最后一步TermDocumentMatrix的所有结果转换为如下数据框:

term frequency
frequency 2
matrix 1

但如果我尝试将结果保存到csv文件中,则只提供频率而不是术语。不知道怎么可能做到?

1 个答案:

答案 0 :(得分:1)

您需要在保存之前创建data.frame

df1 <- c("This is just a text in order to test the term frequency matrix save result process. I would like to save all results after the term frequency process into one dataframe...")

library(tm)
corpusD <- Corpus(VectorSource(df1))
myStopwords <- c("would", "e g")
corpusD <- tm_map(corpusD, tolower)
corpusD <- tm_map(corpusD, removeWords, stopwords('english'))
corpusD <- tm_map(corpusD, removeNumbers)
corpusD <- tm_map(corpusD, removeWords, myStopwords)
corpusD <- tm_map(corpusD, stripWhitespace)
matrixD <- TermDocumentMatrix(corpusD)

res <- data.frame(term=rownames(as.matrix(matrixD)),frequency=rowSums(as.matrix(matrixD))) 
row.names(res)<-NULL

write.csv(res,"c:/temp/tm.csv")

        term frequency
1  dataframe         1
2  frequency         2
3       just         1
4       like         1
5     matrix         1
6        one         1
7      order         1
8    process         2
9     result         1
10   results         1
11      save         2
12      term         2
13      test         1
14      text         1