如何用ruby以有效的方式获得单词频率?

时间:2012-03-12 21:35:18

标签: ruby regex

示例输入:

"I was 09809 home -- Yes! yes!  You was"

并输出:

{ 'yes' => 2, 'was' => 2, 'i' => 1, 'home' => 1, 'you' => 1 }

我的代码不起作用:

def get_words_f(myStr)
    myStr=myStr.downcase.scan(/\w/).to_s;
    h = Hash.new(0)
    myStr.split.each do |w|
       h[w] += 1 
    end
    return h.to_a;
end

print get_words_f('I was 09809 home -- Yes! yes!  You was');

7 个答案:

答案 0 :(得分:19)

这有效,但我对Ruby也是新手。可能有更好的解决方案。

def count_words(string)
  words = string.split(' ')
  frequency = Hash.new(0)
  words.each { |word| frequency[word.downcase] += 1 }
  return frequency
end

而不是.split(' '),您也可以.scan(/\w+/);但是,.scan(/\w+/)会在aren中分隔t"aren't",而.split(' ')则不会。

示例代码的输出:

print count_words('I was 09809 home -- Yes! yes!  You was');

#{"i"=>1, "was"=>2, "09809"=>1, "home"=>1, "yes"=>2, "you"=>1}

答案 1 :(得分:7)

def count_words(string)
  string.scan(/\w+/).reduce(Hash.new(0)){|res,w| res[w.downcase]+=1;res}
end

第二种变体:

def count_words(string)
  string.scan(/\w+/).each_with_object(Hash.new(0)){|w,h| h[w.downcase]+=1}
end

答案 2 :(得分:6)

def count_words(string)
  Hash[
    string.scan(/[a-zA-Z]+/)
      .group_by{|word| word.downcase}
      .map{|word, words|[word, words.size]}
  ]
 end

puts count_words 'I was 09809 home -- Yes! yes!  You was'

答案 3 :(得分:3)

此代码会询问您输入,然后为您找到单词频率:

    puts "enter some text man"
text = gets.chomp
words = text.split(" ")
frequencies = Hash.new(0)
words.each { |word| frequencies[word.downcase] += 1 }
frequencies = frequencies.sort_by {|a, b| b}
frequencies.reverse!
frequencies.each do |word, frequency|
    puts word + " " + frequency.to_s 
end

答案 4 :(得分:2)

这样可行,并忽略了数字:

def get_words(my_str)
    my_str = my_str.scan(/\w+/)
    h = Hash.new(0)
    my_str.each do |s|
        s = s.downcase
        if s !~ /^[0-9]*\.?[0-9]+$/ 
            h[s] += 1
        end
    end
    return h
end

print get_words('I was there 1000 !')
puts '\n'

答案 5 :(得分:2)

您可以查看将文字拆分为单词的my code。基本代码如下:

sentence = "Ala ma kota za 5zł i 10$."
splitter = SRX::Polish::WordSplitter.new(sentence)
histogram = Hash.new(0)
splitter.each do |word,type|
  histogram[word.downcase] += 1 if type == :word
end
p histogram

如果您希望使用英语以外的其他语言,请务必小心,因为在Ruby 1.9中,小写字母不会像您对“Ł”这样的字母一样有效。

答案 6 :(得分:2)

class String
  def frequency
    self.scan(/[a-zA-Z]+/).each.with_object(Hash.new(0)) do |word, hash|
      hash[word.downcase] += 1
    end
  end
end

把“我是09809回家 - 是的!是的!你是”.frequency