Question

我之前有term document matrix，想要将new document添加到that term document matrix，以另一种方式加入两个文档矩阵。

我的术语文档矩阵是：

     Docs
Term   1
eat    7
food   2
run    2
sick   3

然后另一个文件是watch football match and eat food

在此过程之后，我希望我的学期文档矩阵为：

         Docs
Term     1   2
eat      7   1
food     2   1
run      2   0
sick     3   0
watch    0   1
football 0   1
match    0   1
and      0   1

我试过这个：

library("SnowballC")
library("NLP")
library("tm")
library("lsa")

                   #mytermdm (term document matrix i have before)

text2 <- "watch fottball match and eat food"
myCorpus <- Corpus(VectorSource(text2))

tdm2 <- TermDocumentMatrix(myCorpus, control = list
                         (removeNumbers = TRUE, 
                         removePunctuation = TRUE, 
                         stopwords=stopwords_en, 
                         stemming=TRUE)
)
mytdm3 <- c(mytermdm,tdm2)
inspect(mytdm3)

我明白了：

TermDocumentMatrix (terms: 7, document:2)

Error in `[.simple_triplet_matrix`(x,terms,doc)`
    Repeated indices currently no allowed.

Answer 1

我已经解决了它，在结合两个术语文档矩阵之前，我替换了tdm2中的文档名称。所以，完整的算法：

library("SnowballC")
library("NLP")
library("tm")
library("lsa")

#mytermdm (term document matrix i have before)

text2 <- "watch fottball match and eat food"
myCorpus <- Corpus(VectorSource(text2))

tdm2 <- TermDocumentMatrix(myCorpus, control = list
                     (removeNumbers = TRUE, 
                     removePunctuation = TRUE, 
                     stopwords=stopwords_en, 
                     stemming=TRUE)
)

colnames(tdm2) <- as.numeric(max(colnames(mytermdm)))+1     #my add solution 


mytdm3 <- c(mytermdm,tdm2)
inspect(mytdm3)

在R

1 个答案: