按哈希值分类

时间:2014-08-22 18:31:43

标签: ruby arrays select hash reduce

我有一组哈希值,其值如:

by_person = [{ :person => "Jane Smith", :filenames => ["Report.pdf", "File2.pdf"]}, {:person => "John Doe", :filenames => ["Report.pdf] }]

我想最后得到另一个哈希数组(by_file),它将文件名键中的每个唯一值作为by_file数组中的键:

by_file = [{ :filename => "Report.pdf", :people => ["Jane Smith", "John Doe"] }, { :filename => "File2.pdf", :people => [Jane Smith] }]

我试过了:

by_file = []

by_person.each do |person|
  person[:filenames].each do |file|
    unless by_file.include?(file)
      # list people that are included in file
      by_person_each_file = by_person.select{|person| person[:filenames].include?(file)}
      by_person_each_file.each do |person|
        by_file << {
          :file => file,
          :people => person[:person]
        }
      end
    end
  end
end

以及:

by_file.map(&:to_a).reduce({}) {|h,(k,v)| (h[k] ||= []) << v; h}

感谢任何反馈,谢谢!

2 个答案:

答案 0 :(得分:3)

看起来并不太棘手,但您编制它的方式并不是非常有效:

by_person = [{ :person => "Jane Smith", :filenames => ["Report.pdf", "File2.pdf"]}, {:person => "John Doe", :filenames => ["Report.pdf"] }]

by_file = by_person.each_with_object({ }) do |entry, index|
  entry[:filenames].each do |filename|
    set = index[filename] ||= [ ]
    set << entry[:person]
  end
end.collect do |filename, people|
  {
    filename: filename,
    people: people
  }
end

puts by_file.inspect
# => [{:filename=>"Report.pdf", :people=>["Jane Smith", "John Doe"]}, {:filename=>"File2.pdf", :people=>["Jane Smith"]}]

这使用哈希来按人名分组人,基本上反转你的结构,然后在第二遍中将其转换为最终格式。这比在编译期间使用最终格式更有效,因为它没有编入索引,并且需要昂贵的线性搜索才能找到要插入的正确容器。

另一种方法是创建一个默认的哈希构造函数,使其成为您正在寻找的结构:

by_file_hash = Hash.new do |h,k|
  h[k] = {
    filename: k,
    people: [ ]
  }
end

by_person.each do |entry|
  entry[:filenames].each do |filename|
    by_file_hash[filename][:people] << entry[:person]
  end
end

by_file = by_file_hash.values

puts by_file.inspect
# => [{:filename=>"Report.pdf", :people=>["Jane Smith", "John Doe"]}, {:filename=>"File2.pdf", :people=>["Jane Smith"]}]

这可能会或可能不会更容易理解。

答案 1 :(得分:0)

这是一种方法。

<强>代码

def convert(by_person)
  by_person.each_with_object({}) do |hf,hp|
    hf[:filenames].each do |fname|
      hp.update({ fname=>[hf[:person]] }) { |_,oh,nh| oh+nh }
    end    
  end.map { |fname,people| { :filename => fname, :people=>people } }
end

示例

by_person = [{:person=>"Jane Smith", :filenames=>["Report.pdf", "File2.pdf"]},
             {:person=>"John Doe",   :filenames=>["Report.pdf"]}]

convert(by_person)
  #=> [{:filename=>"Report.pdf", :people=>["Jane Smith", "John Doe"]},
  #    {:filename=>"File2.pdf",  :people=>["Jane Smith"]}]

<强>解释

示例中的by_person

enum1 = by_person.each_with_object({})
  #=>[{:person=>"Jane Smith", :filenames=>["Report.pdf", "File2.pdf"]},
      {:person=>"John Doe", :filenames=>["Report.pdf"]}]:each_with_object({})>

让我们看看枚举器enum将传入块中的值:

enum1.to_a
  #=> [[{:person=>"Jane Smith", :filenames=>["Report.pdf", "File2.pdf"]}, {}], 
  #    [{:person=>"John Doe", :filenames=>["Report.pdf"]}, {}]]

如下所示,枚举器的第一个元素中的空哈希将不再为空,第二个元素将传递给块。

第一个元素分配给块变量,如下所示(我缩进以指示块级别):

  hf = {:person=>"Jane Smith", :filenames=>["Report.pdf", "File2.pdf"]}
  hp = {}

  enum2 = hf[:filenames].each
    #=> #<Enumerator: ["Report.pdf", "File2.pdf"]:each>
  enum2.to_a
    #=> ["Report.pdf", "File2.pdf"]

"Report.pdf"传递给内部块,分配给块变量:

  fname = "Report.pdf"

  hp.update({ "Report.pdf"=>["Jane Smith"] }) { |_,oh,nh| oh+nh }
    #=> {"Report.pdf"=>["Jane Smith"]}
执行

,返回更新后的hp

此处未查阅Hash#update(又名Hash#merge!)的块。仅当散列hp和合并散列(此处为{ fname=>["Jane Smith"] })具有一个或多个公共密钥时才需要它。对于每个公共密钥,来自两个哈希的密钥和相应值被传递给块。这在下面详细说明。

接下来,enum2"File2.pdf"传递到块中并将其分配给块变量:

  fname = "File2.pdf"

并执行

  hp.update({ "File2.pdf"=>["Jane Smith"] }) { |_,oh,nh| oh+nh }
    #=> {"Report.pdf"=>["Jane Smith"], "File2.pdf"=>["Jane Smith"]}

返回hp的更新值。同样,没有咨询update的阻止。我们现在已经完成了Jane,所以enum1接下来将其第二个和最后一个值传递给块并按如下方式分配块变量:

hf = {:person=>"John Doe", :filenames=>["Report.pdf"]}
hp = {"Report.pdf"=>["Jane Smith"], "File2.pdf"=>["Jane Smith"]}

请注意,hp现已更新。然后我们有:

  enum2 = hf[:filenames].each
    #=> #<Enumerator: ["Report.pdf"]:each>
  enum2.to_a
    #=> ["Report.pdf"]

enum2分配

  fname = "Report.pdf"

并执行:

  hp.update({ "Report.pdf"=>["John Doe"] }) { |_,oh,nv| oh+nv }
    #=> {"Report.pdf"=>["Jane Smith", "John Doe"], "File2.pdf"=>["Jane Smith"]}

在制作此update时,hp和合并的哈希都有密钥"Report.pdf"。因此,以下值将传递给块变量|k,ov,nv|

  k  = "Report.pdf"
  oh = ["Jane Smith"] 
  nh = ["John Doe"]

我们不需要钥匙,所以我用下划线代替它。该块返回

["Jane Smith"]+["John Doe"] #=> ["Jane Smith", "John Doe"]

,它将成为密钥"Report.pdf"的新值。

在转到最后一步之前,我想建议你考虑在这里停下来。也就是说,不是为每个文件构建一个哈希数组,而是将其作为哈希,将文件作为键和人员数组的值:

{ "Report.pdf"=>["Jane Smith", "John Doe"], "File2.pdf"=>["Jane Smith"] }

最后一步很简单:

hp.map { |fname,people| { :filename => fname, :people=>people } }
  #=> [{ :filename=>"Report.pdf", :people=>["Jane Smith", "John Doe"] },
  #    { :filename=>"File2.pdf", :people=>["Jane Smith"] }]