这是我的计算单词频率的代码
word_arr= ["I", "received", "this", "in", "email", "and", "found", "it", "a", "good", "read", "to", "share......", "Yes,", "Dr", "M.", "Bakri", "Musa", "seems", "to", "know", "what", "is", "happening", "in", "Malaysia.", "Some", "of", "you", "may", "know.", "He", "is", "a", "Malay", "extra horny", "horny nor", "nor their", "their babes", "babes are", "are extra", "extra SEXY..", "SEXY.. .", ". .", ". .It's", ".It's because", "because their", "their CONDOMS", "CONDOMS are", "are Made", "Made In", "In China........;)", "China........;) &&"]
arr_stop_kwd=["a","and"]
frequencies = Hash.new(0)
word_arr.each { |word|
if !arr_stop_kwd.include?(word.downcase) && !word.match('&&')
frequencies["#{word.downcase}"] += 1
end
}
当我有100k数据时需要9.03秒,那么,我可以用很多时间计算其他方式
提前谢谢
答案 0 :(得分:2)
您可以使用frequency method
执行此类操作require 'facets'
frequencies = (word_arr-arr_stop_kwd).frequency
请注意,可以从word_arr
中减去停用词。请参阅Array Documentation。