将DocumentTermMatrix转换为dgTMatrix

时间:2018-04-14 20:12:44

标签: r tm text2vec

我尝试从tm - 包到text2vec的LDA实施来运行AssociatedPress数据集。

我面临的问题是数据类型不兼容:AssociatedPresstm::DocumentTermMatrix,而slam::simple_triplet_matrix又是text2vec的子类。 x但是我希望输入text2vec::lda$fit_transform(x = ...)Matrix::dgTMatrixDocumentTermMatrix

我的问题是:有没有办法强制text2vec library('tm') library('text2vec') data("AssociatedPress", package="topicmodels") dtm <- AssociatedPress[1:10, ] lda_model = LDA$new( n_topics = 10, doc_topic_prior = 0.1, topic_word_prior = 0.01 ) doc_topic_distr = lda_model$fit_transform( x = dtm, n_iter = 1000, convergence_tol = 0.001, n_check_convergence = 25, progressbar = FALSE ) 接受的内容?

最小(失败)示例:

Application.cfc

...给出:

  

base :: rowSums(x,na.rm = na.rm,dims = dims,...):&#39; x&#39;必须是一个   至少有两个维度的数组

1 个答案:

答案 0 :(得分:1)

答案在@Dmitriy Selivanov提供的duplicate中。但它没有提到它来自基础包window

由于我没有安装Matrix,我将使用topicmodels包中包含的crude数据集。原则是一样的。

tm

其余代码:

library(tm)
data("crude")

dtm <- DocumentTermMatrix(crude,
                          control = list(weighting =
                                           function(x)
                                             weightTfIdf(x, normalize =
                                                           FALSE),
                                         stopwords = TRUE))

# transform into a sparseMatrix dgcMatrix
m <-  Matrix::sparseMatrix(i=dtm$i, 
                           j=dtm$j, 
                           x=dtm$v, 
                           dims=c(dtm$nrow, dtm$ncol),
                           dimnames = dtm$dimnames)
str(m)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  ..@ i       : int [1:1890] 6 1 18 6 6 5 9 12 9 5 ...
  ..@ p       : int [1:1201] 0 1 2 3 4 5 6 8 9 11 ...
  ..@ Dim     : int [1:2] 20 1200
  ..@ Dimnames:List of 2
  .. ..$ Docs : chr [1:20] "127" "144" "191" "194" ...
  .. ..$ Terms: chr [1:1200] "\"(it)" "\"demand" "\"expansion" "\"for" ...
  ..@ x       : num [1:1890] 4.32 4.32 4.32 4.32 4.32 ...
  ..@ factors : list()