我有一个如下所示的字符向量:
charVec[1:10]
[1] "dentistry" "free" "cache" "key" "containing" "cite" "templates" "deprecated" "errors" "dates"
然后我制作了矢量的所有3个字母组合:
combwords <- t(combn(charVec,3))
这给了我以下矩阵组合:
[,1] [,2] [,3]
[1,] "import" "school" "dentistry"
[2,] "import" "school" "school"
[3,] "import" "school" "log"
[4,] "import" "school" "search"
[5,] "import" "school" "current"
[6,] "import" "school" "advanced"
现在我想为组合矩阵的每一行创建一个文档术语矩阵(DTM):
word_corpus <- Corpus(VectorSource(combwords))
这不起作用......如何将矩阵(组合)的每一行作为语料库中的一行?
答案 0 :(得分:2)
library(tm)
foo <- apply(combwords, 1, paste, collapse = " ")
foo
## [1] "dentistry free cache" "dentistry free key"
## [3] "dentistry free containing" "dentistry free cite"
## [5] "dentistry cache key" "dentistry cache containing"
## [7] "dentistry cache cite" "dentistry key containing"
## [9] "dentistry key cite" "dentistry containing cite"
## [11] "free cache key" "free cache containing"
## [13] "free cache cite" "free key containing"
## [15] "free key cite" "free containing cite"
## [17] "cache key containing" "cache key cite"
## [19] "cache containing cite" "key containing cite"
tt <- Corpus(VectorSource(foo))
DocumentTermMatrix(tt)
## A document-term matrix (20 documents, 6 terms)
##
## Non-/sparse entries: 60/60
## Sparsity : 50%
## Maximal term length: 10
## Weighting : term frequency (tf)
as.matrix(DocumentTermMatrix(tt))
## Terms
## Docs cache cite containing dentistry free key
## 1 1 0 0 1 1 0
## 2 0 0 0 1 1 1
## 3 0 0 1 1 1 0
## 4 0 1 0 1 1 0
## 5 1 0 0 1 0 1
## 6 1 0 1 1 0 0
## 7 1 1 0 1 0 0
## 8 0 0 1 1 0 1
## 9 0 1 0 1 0 1
## 10 0 1 1 1 0 0
## 11 1 0 0 0 1 1
## 12 1 0 1 0 1 0
## 13 1 1 0 0 1 0
## 14 0 0 1 0 1 1
## 15 0 1 0 0 1 1
## 16 0 1 1 0 1 0
## 17 1 0 1 0 0 1
## 18 1 1 0 0 0 1
## 19 1 1 1 0 0 0
## 20 0 1 1 0 0 1