将R版本更新为最新版本后,我遇到了tm软件包的问题(R 3.4.2。)。
我想使用带有预定义术语列表(单一和复合)的字典来检查DocumentTermMatrix。因此,我使用以下代码在我的数据语料库中应用了BigramTokenizer函数:
data <- c("This example shows how to apply dictionaries in text mining",
"supervised learning methods work with training data",
"unsupervised learning methods are great")
data.corpus <- VCorpus(VectorSource(data))
BigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 1, max = 3))
inspect(DocumentTermMatrix(data.corpus, control = list(tokenize=BigramTokenizer, dictionary = c("example", "dictionaries", "text mining", "supervised learning methods"))))
更新R后,我收到以下错误消息:
Error in simple_triplet_matrix(i, j, v, nrow = length(terms), ncol = length(corpus), : 'i, j' invalid
我试图在网上找到问题的解决方案,但是雪球包的提示不起作用,因为这个包已经不存在了。有没有可能解决这个问题?感谢