如何重新组织/过滤哈希值

时间:2015-08-28 06:46:19

标签: ruby

我有一系列哈希:

data = [{"user_id"=>1, "answer"=>"cupcakes"},
 {"user_id"=>1, "answer"=>"Colorado"},
 {"user_id"=>1, "answer"=>"newspaper"},
 {"user_id"=>2, "answer"=>"fruitcake"},
 {"user_id"=>2, "answer"=>"Louisiana"},
 {"user_id"=>2, "answer"=>"tv"}]

如何重新组织它以便按"user_id"分组并在一个哈希中列出所有"answer"?类似的东西:

output_data = [{"user_id" => 1, "answer1"=>"cupcakes", "answer2"=>"Colorado", "answer3"=>"newspaper"},
{"user_id" => 2, "answer1"=>"fruitcake", "answer2"=>"Louisiana", "answer3"=>"tv"}]

或者可能在数组中包含所有答案:

output_data = [{"user_id" => 1, "answers"=>["cupcakes", "Colorado", "newspaper"]},
{"user_id" => 2, "answers"=>["fruitcake", "Louisiana", "tv"]}]

我与此特定输出无关。我需要将"user_id"作为关键,并将所有答案组织在一起。有什么建议吗?

2 个答案:

答案 0 :(得分:5)

你可以这样做:

<强>代码

def convert(arr)
  arr.each_with_object({}) do |g,h|
    h.update(g["user_id"]=>[g["answer"]]) { |_,o,n| o+n }
  end.map { |k,v| { "user_id"=>k, "answer"=>v } }
end

示例

convert(data)
  #=> [{"user_id"=>1, "answer"=>["cupcakes", "Colorado", "newspaper"]},
  #    {"user_id"=>2, "answer"=>["fruitcake", "Louisiana", "tv"]}]

<强>解释

我们有:

enum = data.each_with_object(Hash.new { |h,k| h[k] = [] })
  #=> #<Enumerator: [{"user_id"=>1, "answer"=>"cupcakes"},
  #                  {"user_id"=>1, "answer"=>"Colorado"},
  #                  {"user_id"=>1, "answer"=>"newspaper"},
  #                  {"user_id"=>2, "answer"=>"fruitcake"},
  #                  {"user_id"=>2, "answer"=>"Louisiana"},
  #                  {"user_id"=>2, "answer"=>"tv"}]:
  #   each_with_object({})> 

我们可以将枚举器转换为数组,以查看将传递给块的值:

a = enum.to_a 
  #=> [[{"user_id"=>1, "answer"=>"cupcakes"}, {}],
  #    [{"user_id"=>1, "answer"=>"Colorado"}, {}],
  #    [{"user_id"=>1, "answer"=>"newspaper"}, {}],
  #    [{"user_id"=>2, "answer"=>"fruitcake"}, {}],
  #    [{"user_id"=>2, "answer"=>"Louisiana"}, {}],
  #    [{"user_id"=>2, "answer"=>"tv"}, {}]]

如您所见,枚举器包含六个元素,每个元素都包含一个由data元素组成的双元素数组和一个最初为空的哈希。

关键是我正在使用Hash#update(又名merge!)的形式,当两个哈希值合并时,使用一个块来确定键的值。< / p>

enum的第一个元素被传递给块并分配给块变量,如下所示:

g, h = enum.next
  #=> [{"user_id"=>1, "answer"=>"cupcakes"}, {}] 
g #=> {"user_id"=>1, "answer"=>"cupcakes"} 
h #=> {} 

因此,块计算是:

h.update(g["user_id"]=>[g["answer"]])
  # {}.update(1=>["cupcakes"])
  #=> {1=>["cupcakes"]}
h #=> {1=>["cupcakes"]}

update的块未用于此第一次合并操作,因为(合并之前)h没有键1。在稍后的操作中再次g["user_id"] #=> 1。此时,该块将用于确定键1的值。

这导致:

h = data.each_with_object({}) do |g,h|
  h.update(g["user_id"]=>[g["answer"]]) { |_,o,n| o+n }
end
  #=> { 1=>["cupcakes", "Colorado", "newspaper"],
  #     2=>["fruitcake", "Louisiana", "tv"] } 

h的键元素对映射到所需的哈希数组是一件简单的事情。

<强>替代

通过合并哈希来实现此目的的另一种方法如下:

data.each_with_object(Hash.new { |h,k| h[k]=[] }) do |g,h|
  h[g["user_id"]] << g["answer"]
end.map { |k,v| { "user_id"=>k, "answer"=>v } }
  #=> [{"user_id"=>1, "answer"=>["cupcakes", "Colorado", "newspaper"]},
  #    {"user_id"=>2, "answer"=>["fruitcake", "Louisiana", "tv"]}]

h[k]没有键h时要修改k时,这会为哈希提供一个空数组的默认值。例如:

h = Hash.new { |h,k| h[k]=[] }
  #=> {} 
h[:cat] << 'boots'
  #=> ["boots"] 
h #=> {:cat=>["boots"]} 

答案 1 :(得分:5)

您的预期结果没有意义。要保留"answer"信息,您需要将它们保存为数组。

data.group_by{|h| h["user_id"]}.each{|_, v| v.map!{|h| h["answer"]}}
# =>
# {
#   1=>["cupcakes", "Colorado", "newspaper"],
#   2=>["fruitcake", "Louisiana", "tv"]
# }

"user_id""answer"这样的字符串是多余的,您应该避免它们存在于数据中,除非它有助于以任何方式使它们清晰。