我正在尝试使用预定词汇表在R中使用tm包创建一个文档术语矩阵,我得到的结果不正确。这是代码:
library(tm)
a = c("good great awesome", "aww nice great")
corpus = Corpus(VectorSource(a))
#no dictionary works as expected:
as.matrix(DocumentTermMatrix(corpus))
#now if I want my own vocabulary:
terms = c("good", "nice", "aww", "bad", "awesome")
as.matrix(DocumentTermMatrix(corpus, control = list(dictionary=terms)))
#this gives:
#Docs good nice aww bad awesome
# 1 1 1 0 0 0
# 2 0 0 1 1 0
这是错误的。有谁知道发生了什么事?
有关版本的更多信息:
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.12.5 (Sierra)
tm_0.7-1