构建一系列降序匹配计数?

时间:2014-04-30 22:32:31

标签: ruby hash frequency

我有一个散列,其中键是书名,而值是书中的一个单词数组。

我想写一个方法,如果我输入一个单词,我可以搜索哈希,找到哪个数组的单词频率最高。然后我想按照大多数匹配的顺序返回书籍标题的数组。

该方法应按搜索单词出现次数最多的降序返回一个数组。

这是我到目前为止所做的:

def search(query) 
  books_names = @book_info.keys
  book_info = {}

@result.each do |key,value|
  contents = @result[key]
  if contents.include?(query)
   book_info[:key] += 1
 end
end

3 个答案:

答案 0 :(得分:3)

如果book_info是您的哈希值,而input_str是您要在book_info中搜索的字符串,则以下内容将按频率input_str的顺序返回哈希值在text

 Hash[book_info.sort_by{|k, v| v.count(input_str)}.reverse]

如果您希望输出成为图书名称数组,请删除Hash并取出第一个元素:

 book_info.sort_by{|k, v| v.count(input_str)}.reverse.map(&:first)

这是一个更紧凑的版本(但little bit慢),用负排序标准替换reverse

 book_info.sort_by{|k, v| -v.count(input_str)}.map(&:first)

答案 1 :(得分:1)

您可能需要考虑创建Book类。这是一个书类,它将单词索引为word_count哈希以便快速排序。

class Book
  attr_accessor :title, :words
  attr_reader :word_count

  @books = []

  class << self
    attr_accessor :books

    def top(word)
      @books.sort_by{|b| b.word_count[word.downcase]}.reverse
    end
  end

  def initialize
    self.class.books << self
    @word_count = Hash.new { |h,k| h[k] = 0}
  end

  def words=(str)
    str.gsub(/[^\w\s]/,"").downcase.split.each do |word|
      word_count[word] += 1
    end
  end

  def to_s
    title
  end
end

像这样使用它:

a = Book.new
a.title = "War and Peace"
a.words = "WELL, PRINCE, Genoa and Lucca are now no more than private estates of the Bonaparte family."

b = Book.new
b.title = "Moby Dick"
b.words = "Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world."

puts Book.top("ago")

结果:

Moby Dick
War and Peace

答案 2 :(得分:1)

这是构建哈希的一种方法,哈希的键是单词,其值是带有键:title:count的哈希数组,哈希值通过减少count的值来排序。 / p>

<强>代码

我假设我们将以哈希books开头,其键是标题,其值是书中所有文本中的一个字符串。

def word_count_hash(books)
  word_and_count_by_title = books.each_with_object({}) { |(title,words),h|
    h[title] = words.scan(/\w+/)
                    .map(&:downcase)
                    .each_with_object({}) { |w,g| g[w] = (g[w] || 0)+1 } }

  title_and_count_by_word = word_and_count_by_title
    .each_with_object({}) { |(title,words),g| words.each { |w,count|
      g.update({w =>[{title: title, count: count}]}){|_,oarr,narr|oarr+narr}}}

  title_and_count_by_word.keys.each { |w| g[w].sort_by! { |h| -h[:count] } }
  title_and_count_by_word
end

示例

books = {}
books["Grapes of Wrath"] =
<<_ 
To the red country and part of the gray country of Oklahoma, the last rains
came gently, and they did not cut the scarred earth. The plows crossed and
recrossed the rivulet marks. The last rains lifted the corn quickly and
scattered weed colonies and grass along the sides of the roads so that the
gray country and the dark red country began to disappear under a green cover.
_

books["Tale of Two Cities"] =
<<_ 
It was the best of times, it was the worst of times, it was the age of wisdom,
it was the age of foolishness, it was the epoch of belief, it was the epoch of
incredulity, it was the season of Light, it was the season of Darkness, it was
the spring of hope, it was the winter of despair, we had everything before us,
we had nothing before us, we were all going direct to Heaven, we were all
going direct the other way
_

books["Moby Dick"] =
<<_ 
Call me Ishmael. Some years ago - never mind how long precisely - having little
or no money in my purse, and nothing particular to interest me on shore, I
thought I would sail about a little and see the watery part of the world. It is
a way I have of driving off the spleen and regulating the circulation. Whenever
I find myself growing grim about the mouth; whenever it is a damp, drizzly
November in my soul; whenever I find myself involuntarily pausing before coffin
warehouses
_

构造哈希:

title_and_count_by_word = word_count_hash(books)

然后查找单词:

title_and_count_by_word["the"]
  #=> [{:title=>"Grapes of Wrath", :count=>12},
  #    {:title=>"Tale of Two Cities", :count=>11},
  #    {:title=>"Moby Dick", :count=>5}]
title_and_count_by_word["to"]
  #=> [{:title=>"Grapes of Wrath", :count=>2},
  #    {:title=>"Tale of Two Cities", :count=>1},
  #    {:title=>"Moby Dick", :count=>1}]

请注意,正在查找的单词必须输入(或转换为)小写。

<强>解释

构造第一个哈希:

word_and_count_by_title = books.each_with_object({}) { |(title,words),h|
  h[title] = words.scan(/\w+/)
                  .map(&:downcase)
                  .each_with_object({}) { |w,g| g[w] = (g[w] || 0)+1 } }
  #=> {"Grapes of Wrath"=>
  #      {"to"=>2, "the"=>12, "red"=>2, "country"=>4, "and"=>6, "part"=>1,
  #       ...
  #       "disappear"=>1, "under"=>1, "a"=>1, "green"=>1, "cover"=>1},
  #    "Tale of Two Cities"=>
  #      {"it"=>10, "was"=>10, "the"=>11, "best"=>1, "of"=>10, "times"=>2,
  #       ...
  #       "going"=>2, "direct"=>2, "to"=>1, "heaven"=>1, "other"=>1, "way"=>1},
  #    "Moby Dick"=>
  #      {"call"=>1, "me"=>2, "ishmael"=>1, "some"=>1, "years"=>1, "ago"=>1,
  #       ...
  #       "pausing"=>1, "before"=>1, "coffin"=>1, "warehouses"=>1}}

要了解此处发生了什么,请考虑books传递到块中的title #=> "Grapes of Wrath" words #=> "To the red country and part of the gray country of Oklahoma, the # last rains came gently,\nand they did not cut the scarred earth. # ... # the dark red country began to disappear\nunder a green cover.\n" 的第一个元素。这两个块变量分配了以下值:

each_with_object

h创建了一个由块变量q = words.scan(/\w+/).map(&:downcase) #=> ["to", "the", "red", "country", "and", "part", "of", "the", "gray", # ... # "began", "to", "disappear", "under", "a", "green", "cover"] 表示的哈希,它最初是空的。

首先构造一个单词数组并将每个单词转换为小写。

h[title] = q.each_with_object({}) { |w,g| g[w] = (g[w] || 0) + 1 }
  #=> {"to"=>2, "the"=>12, "red"=>2, "country"=>4, "and"=>6, "part"=>1,
  #    ...
  #    "disappear"=>1, "under"=>1, "a"=>1, "green"=>1, "cover"=>1}

我们现在可以创建一个哈希,其中包含标题的每个单词的计数&#34;愤怒的葡萄&#34;:

g[w] = (g[w] || 0) + 1 

注意表达式

g

如果哈希w已经有单词g[w] = g[w] + 1 的密钥,则此表达式等同于

g

另一方面,如果g[w] => nil没有此密钥(单词)(在这种情况下为g[w] = 0 + 1 ),则表达式与

等效
title_and_count_by_word =
  word_and_count_by_title.each_with_object({}) { |(title,words),g|
    words.each { |w,count| g.update({ w => [{title: title, count: count}]}) \
      { |_, oarr, narr| oarr + narr } } }
  #=> {"to"        => [{:title=>"Grapes of Wrath", :count=>2},
  #                    {:title=>"Tale of Two Cities", :count=>1},
  #                    {:title=>"Moby Dick", :count=>1}],
  #=>  "the"       => [{:title=>"Grapes of Wrath", :count=>12},
  #                    {:title=>"Tale of Two Cities", :count=>11},
  #                    {:title=>"Moby Dick", :count=>5}],
  #    ...
  #    "warehouses"=> [{:title=>"Moby Dick", :count=>1}]}

然后对其他两本书中的每一本都进行相同的计算。

我们现在可以构造第二个哈希。

:count

(请注意,此操作不会按Hash#merge!对每个单词的哈希值进行排序,即使在此输出片段中可能出现这种情况。哈希在下一步和最后一步中进行排序。)< / p>

此处需要解释的主要操作是Enumerable#each_with_object(又名g)。我们正在构建一个由块变量:title表示的哈希,它最初是空的。此哈希的键是单词,值是带键:countg的哈希值。每当合并的散列具有已经是{ |_, oarr, narr| oarr + narr } 的键的键(字)时,块

:count
调用

来确定合并散列中键的值。这里的块变量是键(单词)(我们用下划线替换它因为它不会被使用),旧的哈希数组和要合并的新哈希数组(其中只有一个)。我们只需将新哈希添加到合并的哈希数组中。

最后,我们按照title_and_count_by_word.keys.each { |w| g[w].sort_by! { |h| -h[:count] } } title_and_count_by_word #=> {"to"=> # [{:title=>"Grapes of Wrath", :count=>2}, # {:title=>"Tale of Two Cities", :count=>1}, # {:title=>"Moby Dick", :count=>1}], # "the"=> # [{:title=>"Grapes of Wrath", :count=>12}, # {:title=>"Tale of Two Cities", :count=>11}, # {:title=>"Moby Dick", :count=>5}], # ... # "warehouses"=>[{:title=>"Moby Dick", :count=>1}]} 的递减值对哈希值(哈希数组)进行排序。

{{1}}