文档术语矩阵的频率分布图

时间:2015-07-26 17:59:02

标签: r ggplot2 tm

我创建了一个文档术语矩阵,如下所示:

javax.crypto.BadPaddingException: Given final block not properly padded
    at main.decrypt(main.java:98)
    at main.main(main.java:26)
    ... 9 more
Caused by: javax.crypto.BadPaddingException: Given final block not properly padded
    at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:966)
    at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:824)
    at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:436)
    at javax.crypto.Cipher.doFinal(Cipher.java:2048)
    at main.decrypt(main.java:95)

在拿到它的列总和之后它给了我。

inspect(dtm[1:4,1:6])

              allowed allowing almost alone companyunder companywide 
Doc1.txt         1      1         1     0       1             0
Doc2.txt         0      1         1     0       1             1
Doc3.txt         0      0         0     1       0             1
Doc4.txt         1      0         1     0       1             1

这实际上表明这些单词可以在多少文档中找到(例如,允许2告诉我允许在两个文档中找到。)。

我很难创建一个频率分布图,它将x轴作为文档编号,y轴作为文档包含的字数。

1 个答案:

答案 0 :(得分:0)

这是你要找的吗?

dtm = array(c(1,0,0,1,1,1,0,0,1,1,0,1,0,0,1,0,1,1,0,1,0,1,1,1),dim=c(4,6))
dimnames(dtm) = list(c("Doc1","Doc2","Doc3","Doc4"),c("allowed","allowing","almost","alone","companyunder","companywide"))
print(dtm)
plot(rowSums(dtm))