Question

我的数据集中有40行和3个属性列。每行都是一个单独的文本文档。我使用TermdocumentMatrix() library(tm)函数将字符串转换为单独的字词。但是这个函数将属性列的数量视为文档的数量。为什么会这样？我犯了一些错误吗？

R中是否有任何属性过滤器类似于weka＆＃39; StringToWordVector过滤器？我希望结果与weka＆＃39; StringToWordVector过滤器

相同

示例如下所示：

Title, Author, BookSummary

The Da Vinci Code, Dan Brown, Louvre curator and Priory of Sion Grand Master Jacques<br>

此示例仅显示1行。

我尝试了这段代码： -

data<-read.csv("C:/Users/admin/Desktop/RTextMining/dataset.csv")
corpus.tmp<-Corpus(VectorSource(data))
View(corpus.tmp)

corpus.tmp<- tm_map(corpus.tmp,removePunctuation)  
corpus.tmp<- tm_map(corpus.tmp, stripWhitespace)
corpus.tmp<- tm_map(corpus.tmp, tolower)
corpus.tmp<- tm_map(corpus.tmp, removeWords, stopwords("english"))

library(SnowballC)
corpus.tmp <- tm_map(corpus.tmp, stemDocument)

TDM <- TermDocumentMatrix(corpus.tmp)

字符串到R中的Word矢量

0 个答案: