R DocumentTermMatrix控件列表不起作用,默默地忽略未知参数

时间:2012-11-13 18:54:41

标签: r matrix controls term tm

我有两个以下的DTM:

dtm <- DocumentTermMatrix(t)

dtmImproved <- DocumentTermMatrix(t, 
               control=list(minWordLength = 4, minDocFreq=5))

当我实现这个时,我看到两个相同的DTM-s,如果我打开dtmImproved,则有3个符号的单词。为什么minWordLength参数不起作用?谢谢!

> dtm
A document-term matrix (591 documents, 10533 terms)

Non-/sparse entries: 43058/6181945
Sparsity           : 99%
Maximal term length: 135 
Weighting          : term frequency (tf)
> dtmImproved
A document-term matrix (591 documents, 10533 terms)

Non-/sparse entries: 43058/6181945
Sparsity           : 99%
Maximal term length: 135 
Weighting          : term frequency (tf)

2 个答案:

答案 0 :(得分:25)

dtmImproved <- DocumentTermMatrix(t, control=list(wordLengths=c(4, 15), 
                                   bounds = list(global = c(5,Inf))))

这解决了这个问题!缺乏适当的文件确实让我失望(:

答案 1 :(得分:0)

如果可用的话,阅读源代码总是一个好主意。阅读wordcloud函数@ GitHub的源代码,这里是它的说法:
    #作者:ianfellows
    .....
    if(min.freq&gt; max(freq))
    min.freq&lt; - 0

所以你的DocumentTermMatrix返回了max(freq)&lt;你设置的min.freq绑定,即非超出你设置的min.freq界限的条款。

希望这有助于 MJJ