我正在尝试使用inspect(TermDocumentMatrix())
来获取文本文档之间的单词/术语频率列表(在R中)
使用?TermDocumentMatrix
中的示例代码:
data("crude")
tdm <- TermDocumentMatrix(crude, control = list(removePunctuation = TRUE,
stopwords = TRUE))
dtm <- DocumentTermMatrix(crude, control = list(weighting = function(x)
weightTfIdf(x, normalize = stopwords = TRUE)))
现在,我可以检查一下:
inspect(tdm[1:1000, 1:5])
结果:
<<TermDocumentMatrix (terms: 1000, documents: 5)>>
Non-/sparse entries: 322/4678
Sparsity : 94%
Maximal term length: 16
Weighting : term frequency (tf)
Sample :
Docs
Terms 127 144 191 194 211
crude 2 0 2 3 0
demand 0 5 0 0 0
dlrs 2 0 1 2 2
mln 0 4 0 0 2
oil 5 12 2 1 1
opec 0 13 0 0 0
price 2 1 2 2 0
prices 3 5 0 0 0
production 0 6 0 0 0
said 3 11 1 1 3
但是,我想要更长的术语列表......我怎么能得到这个?
我已经尝试了myinspection = inspect(tdm[1:1000, 1:5])
,但它并没有让我任何地方