Question

是否存在一种方法来同时连接包含不同数量的列和行的两个dfm矩阵？它可以通过一些额外的编码来完成，所以我对一个特殊的代码不感兴趣，但是如果存在任何代码，则在一般而优雅的解决方案中。

一个例子：

dfm1 <- dfm(c(doc1 = "This is one sample text sample."), verbose = FALSE)
dfm2 <- dfm(c(doc2 = "Surprise! This is one sample text sample."), verbose = FALSE)
rbind(dfm1, dfm2)

给出错误。

＆＃39;＆＃39;包可以将其dfm矩阵连接到盒子外;这对我来说太慢了。

还记得＆＃39; dfm＆＃39;来自＆＃39; quanteda＆＃39;是一个S4级。

Answer 1

如果你使用的是最新版本，应该“开箱即用”：

packageVersion("quanteda")
## [1] ‘0.9.6.9’

dfm1 <- dfm(c(doc1 = "This is one sample text sample."), verbose = FALSE)
dfm2 <- dfm(c(doc2 = "Surprise! This is one sample text sample."), verbose = FALSE)

rbind(dfm1, dfm2)
## Document-feature matrix of: 2 documents, 6 features.
## 2 x 6 sparse Matrix of class "dfmSparse"
##      is one sample surprise text this
## doc1  1   1      2        0    1    1
## doc2  1   1      2        1    1    1

另请参阅?selectFeatures其中features是dfm对象（帮助文件中有示例）。

<强>加：

请注意，这将正确对齐公共功能集中的两个文本，这与矩阵的常规rbind方法不同，后者的列必须匹配。出于同样的原因，对于具有不同术语的DocumentTermMatrix对象，rbind()实际上不适用于 tm 包：

require(tm)
dtm1 <- DocumentTermMatrix(Corpus(VectorSource(c(doc1 = "This is one sample text sample."))))
dtm2 <- DocumentTermMatrix(Corpus(VectorSource(c(doc2 = "Surprise! This is one sample text sample."))))
rbind(dtm1, dtm2)
## Error in f(init, x[[i]]) : Numbers of columns of matrices must match.

这几乎得到了它，但似乎复制了重复的功能：

as.matrix(rbind(c(dtm1, dtm2)))
##     Terms
## Docs one sample sample. text this surprise!
##    1   1      1       1    1    1         0
##    1   1      1       1    1    1         1

在＆＃39; quanteda＆＃39;中连接dfm矩阵。包

1 个答案: