我之前有term document matrix
,想要将new document
添加到that term document matrix
,以另一种方式加入两个文档矩阵。
我的术语文档矩阵是:
Docs
Term 1
eat 7
food 2
run 2
sick 3
然后另一个文件是watch football match and eat food
在此过程之后,我希望我的学期文档矩阵为:
Docs
Term 1 2
eat 7 1
food 2 1
run 2 0
sick 3 0
watch 0 1
football 0 1
match 0 1
and 0 1
我试过这个:
library("SnowballC")
library("NLP")
library("tm")
library("lsa")
#mytermdm (term document matrix i have before)
text2 <- "watch fottball match and eat food"
myCorpus <- Corpus(VectorSource(text2))
tdm2 <- TermDocumentMatrix(myCorpus, control = list
(removeNumbers = TRUE,
removePunctuation = TRUE,
stopwords=stopwords_en,
stemming=TRUE)
)
mytdm3 <- c(mytermdm,tdm2)
inspect(mytdm3)
我明白了:
TermDocumentMatrix (terms: 7, document:2)
Error in `[.simple_triplet_matrix`(x,terms,doc)`
Repeated indices currently no allowed.
答案 0 :(得分:0)
我已经解决了它,在结合两个术语文档矩阵之前,我替换了tdm2中的文档名称。所以,完整的算法:
library("SnowballC")
library("NLP")
library("tm")
library("lsa")
#mytermdm (term document matrix i have before)
text2 <- "watch fottball match and eat food"
myCorpus <- Corpus(VectorSource(text2))
tdm2 <- TermDocumentMatrix(myCorpus, control = list
(removeNumbers = TRUE,
removePunctuation = TRUE,
stopwords=stopwords_en,
stemming=TRUE)
)
colnames(tdm2) <- as.numeric(max(colnames(mytermdm)))+1 #my add solution
mytdm3 <- c(mytermdm,tdm2)
inspect(mytdm3)