将矩阵转换为R中的文档术语矩阵

时间:2014-04-03 14:04:01

标签: r matrix

我有一个如下所示的字符向量:

charVec[1:10]
[1] "dentistry"  "free"       "cache"      "key"        "containing" "cite"       "templates"  "deprecated" "errors"     "dates"  

然后我制作了矢量的所有3个字母组合:

combwords <- t(combn(charVec,3))

这给了我以下矩阵组合:

    [,1]     [,2]     [,3]       
[1,] "import" "school" "dentistry"
[2,] "import" "school" "school"   
[3,] "import" "school" "log"      
[4,] "import" "school" "search"   
[5,] "import" "school" "current"  
[6,] "import" "school" "advanced" 

现在我想为组合矩阵的每一行创建一个文档术语矩阵(DTM):

word_corpus <- Corpus(VectorSource(combwords))

这不起作用......如何将矩阵(组合)的每一行作为语料库中的一行?

1 个答案:

答案 0 :(得分:2)

library(tm)

foo <- apply(combwords, 1, paste, collapse = " ")
foo

##  [1] "dentistry free cache"       "dentistry free key"        
##  [3] "dentistry free containing"  "dentistry free cite"       
##  [5] "dentistry cache key"        "dentistry cache containing"
##  [7] "dentistry cache cite"       "dentistry key containing"  
##  [9] "dentistry key cite"         "dentistry containing cite" 
## [11] "free cache key"             "free cache containing"     
## [13] "free cache cite"            "free key containing"       
## [15] "free key cite"              "free containing cite"      
## [17] "cache key containing"       "cache key cite"            
## [19] "cache containing cite"      "key containing cite" 

tt <- Corpus(VectorSource(foo))
DocumentTermMatrix(tt)

## A document-term matrix (20 documents, 6 terms)
## 
## Non-/sparse entries: 60/60
## Sparsity           : 50%
## Maximal term length: 10 
## Weighting          : term frequency (tf)

as.matrix(DocumentTermMatrix(tt))

##     Terms
## Docs cache cite containing dentistry free key
##   1      1    0          0         1    1   0
##   2      0    0          0         1    1   1
##   3      0    0          1         1    1   0
##   4      0    1          0         1    1   0
##   5      1    0          0         1    0   1
##   6      1    0          1         1    0   0
##   7      1    1          0         1    0   0
##   8      0    0          1         1    0   1
##   9      0    1          0         1    0   1
##   10     0    1          1         1    0   0
##   11     1    0          0         0    1   1
##   12     1    0          1         0    1   0
##   13     1    1          0         0    1   0
##   14     0    0          1         0    1   1
##   15     0    1          0         0    1   1
##   16     0    1          1         0    1   0
##   17     1    0          1         0    0   1
##   18     1    1          0         0    0   1
##   19     1    1          1         0    0   0
##   20     0    1          1         0    0   1