想象一下tm package创建的以下文档术语矩阵:
> frequencies
<<DocumentTermMatrix (documents: 255, terms: 470)>>
Non-/sparse entries: 7693/112157
Sparsity : 94%
Maximal term length: 10
Weighting : term frequency (tf)
什么是最大术语长度?
答案 0 :(得分:1)
最大字词长度是文档字词矩阵中一个(或多个)字词的最大字符数。
示例:如果dtm中有5个单词,而最长的单词是“编程”,则最大术语长度为11。
text <- c("word1", "word2", "word3", "word4", "programming")
corp <- Corpus(VectorSource(text))
term <- DocumentTermMatrix(corp)
term
<<DocumentTermMatrix (documents: 5, terms: 5)>>
Non-/sparse entries: 5/20
Sparsity : 80%
Maximal term length: 11
Weighting : term frequency (tf)