根据数组中的单词数向用户显示消息

时间:2013-10-30 21:00:05

标签: ruby arrays hash spam

我读过一个文件并将它们分成一个单词数组:

file1 = File.open("spam1.txt","rb")
file1_contents = file1.read
file1 = file1_contents.split(' ')

我可以使用哈希计算单词的频率,并根据单词的频率对单词进行排序:

freqs1 = Hash.new(0)
file1.each { |word| freqs1[word] +=1}
freqs1 = freqs1.sort_by {|x,y| y}
freqs1.reverse!

还可以将结果输出给用户:

freqs.each{|word, freq| puts word + ' ' + freq.to_s}

如果数组file1或哈希freqs1多次包含某些单词,我想向用户显示一条消息。

我有一个(坏)想法循环遍历freqs1哈希并向用户显示相应的消息:

freqs1.each{|word,freq|
    if ((word == ('business' || 'fund' || 'funds' || 'account' ||'transfer' || 'money')) && freq > 2)  || (word == 'Iraq' && freq > 1 )  then
      puts "File 1 is suspected as spam mail - suspicious word frequency"
    else
      puts "File 1 does not appear to be spam email"
    end
}

然而,这对我来说很愚蠢,因为这会影响hash的每个元素。

如果business, fund, funds, account等字词出现两次以上,如何向用户显示某条消息?

感谢您的帮助。

2 个答案:

答案 0 :(得分:1)

如果你只是想改进最终陈述,试试这个(未经测试,但应该去):

bad_words = %w{business fund funds account transfer money}
is_spam = freqs1.any? do |word, freq| 
  (freq > 2 && bad_words.include?(word)) || (word == 'Iraq' && freq > 1)
end

if is_spam
  puts "File 1 is suspected as spam mail - suspicious word frequency"
else
  puts "File 1 does not appear to be spam email"
end

Enumerable#any?将为您完成大部分工作,同时提取坏词列表有助于提高可读性。

答案 1 :(得分:1)

我会做这样的事情:

word_filter = [
 {count: 2, words: ['business','fund','funds','account','transfer','money']},
 {count: 1, words: ['iraq']}
]

alert        = "File 1 is suspected as spam mail - suspicious word frequency"
calm_message = "File 1 does not appear to be spam email"

grouped_words = file1.group_by{|x|x}.map{|x,array|[x,array.size]}

appears_to_be_spam = grouped_words.any?{ |word,count|
  word_filter.any? do |filter|
    filter[:words].include?(word.downcase) &&  count > filter[:count]
  end
}

puts appears_to_be_spam ? alert : calm_message