Question

我正在尝试使用R“tm”包计算我的语料库中的关键字。到目前为止，这是我的代码：

# get the data strings
f<-as.vector(forum[[1]])

# replace +
f<-gsub("+", " ", f ,fixed=TRUE)

# lower case
f<-tolower(f)

# show all strings that contain mobile
mobile<- f[grep("mobile", f, ignore.case = FALSE, perl = FALSE, value = FALSE,
     fixed = FALSE, useBytes = FALSE, invert = FALSE)]
text.corp.mobile <- Corpus(VectorSource(mobile))
text.corp.mobile <- tm_map(text.corp.mobile , removePunctuation) 
text.corp.mobile <- tm_map(text.corp.mobile , removeWords, c(stopwords("english"),"mobile")) 
dtm.mobile <- DocumentTermMatrix(text.corp.mobile)
dtm.mobile 
dtm.mat.mobile <- as.matrix(dtm.mobile)
dtm.mat.mobile

这将返回一个表格，其中包含天气的二进制结果，关键字出现在其中一个语料库文本中。我希望得到每个关键字的计数，而不是以二进制形式获得最终结果。例如： '车'出现了5次 '按钮'出现了9次

Answer 1

没有看到你的实际数据，它有点难以辨别但是因为你刚刚调用DocumentTermMatrix我会尝试这样的事情：

dtm.mat.mobile <- as.matrix(dtm.mobile)
word.freqs <- sort(rowSums(dtm.mat.mobile), decreasing=TRUE)

在R中使用tm包计算关键字

1 个答案: