Ruby - 在公共哈希键/值组合上组合/展平多个哈希数组

时间:2016-02-17 02:45:43

标签: ruby hash flatten keyvaluepair

我正在处理一个包含多个哈希数组的大型数据集,它们都有一个共同的键值对(“日期”和日期值)作为哈希的第一个元素。

我需要解析的哈希数组(@data [“snapshot”])采用以下格式。请注意@data [“snapshot”] [0],@ data [“snapshot”] [1]和@data [“snapshot”] [2]采用完全相同的格式,日期相同但他们的总数不同。在生成的哈希中,我需要有一个键值对,用于标识数据的来源。

@data [“snapshot”] [0]如下:

[{"date"=>"1455672010", "total"=>"**817**", "I"=>"1", "L"=>"3", "M"=>"62", "H"=>"5", "C"=>"0"},
 {"date"=>"1455595298", "total"=>"**40**", "I"=>"8", "L"=>"5", "M"=>"562", "H"=>"125", "C"=>"0"},
 {"date"=>"1455336016", "total"=>"**555**", "I"=>"10", "L"=>"1", "M"=>"93", "H"=>"121", "C"=>"0"}]

@data [“snapshot”] [1]如下:

[{"date"=>"1455672010", "total"=>"**70**", "I"=>"1", "L"=>"9", "M"=>"56", "H"=>"25", "C"=>"0"},
 {"date"=>"1455595298", "total"=>"**54**", "I"=>"8", "L"=>"2", "M"=>"5", "H"=>"5", "C"=>"0"},
 {"date"=>"1455336016", "total"=>"**25**", "I"=>"0", "L"=>"9", "M"=>"93", "H"=>"12", "C"=>"0"}]

@data [“snapshot”] [2]如下:

[{"date"=>"1455672010", "total"=>"**70**", "I"=>"12", "L"=>"5", "M"=>"5662", "H"=>"125", "C"=>"0"},
 {"date"=>"1455595298", "total"=>"**43212**", "I"=>"56", "L"=>"6", "M"=>"5662", "H"=>"125", "C"=>"0"},
 {"date"=>"1455336016", "total"=>"**55525**", "I"=>"100", "L"=>"19", "M"=>"5593", "H"=>"121", "C"=>"0"}]

我的问题归根结底:

如何转换(展平?)现有的3个哈希数组(@data [“snapshot”] [0],@ dat [“snapshot”] [1]和@data [“snapshot”] [2 ])采用以下格式的单个哈希数组?

[{"date"=>"1455672010", "CameFromDataSource0"=>"817", "CameFromDataSource1"=>"70", "CameFromDataSource2"=>"70"},
 {"date"=>"1455595298", "CameFromDataSource0"=>"40", "CameFromDataSource1"=>"54", "CameFromDataSource2"=>"43212"},   
 {"date"=>"1455336016", "CameFromDataSource0"=>"555", "CameFromDataSource1"=>"25", "CameFromDataSource2"=>"55525"}]

2 个答案:

答案 0 :(得分:2)

这是一种方法。

<强>代码

def convert(data)
  data.each_with_object({}) { |a,h|
    a.each { |g| h.update(g["date"]=>[g["total"][/\d+/]]) { |_,o,n| o+n } } }.
      map { |date, arr| arr.each_with_index.with_object({"date"=>date}) { |(e,i),h| 
        h["key#{i}"] = e } }
end

示例

convert(data)
  #=> [{"date"=>"1455672010", "key0"=>"817", "key1"=>"70", "key2"=>"70"},
  #    {"date"=>"1455595298", "key0"=>"40", "key1"=>"54", "key2"=>"43212"},
  #    {"date"=>"1455336016", "key0"=>"555", "key1"=>"25", "key2"=>"55525"}] 

两个步骤

您可以看到我已经分两步完成了这项工作。首先构造一个哈希:

f = data.each_with_object({}) { |a,h| a.each { |g|
  h.update(g["date"]=>[g["total"][/\d+/]]) { |_,o,n| o+n } } }
    #=> {"1455672010"=>["817", "70", "70"],
    #    "1455595298"=>["40", "54", "43212"],
    #    "1455336016"=>["555", "25", "55525"]} 

这里我使用Hash#update(又名merge!)的形式,它使用一个块({ |_,o,n| o+n })来确定两个哈希值中合并的键值。< / p>

然后将散列转换为所需的格式:

f.map { |date, arr| arr.each_with_index.with_object({"date"=>date}) { |(e,i),h| 
  h["key#{i}"] = e } }
  #=> [{"date"=>"1455672010", "key0"=>"817", "key1"=>"70", "key2"=>"70"},
  #    {"date"=>"1455595298", "key0"=>"40", "key1"=>"54", "key2"=>"43212"},
  #    {"date"=>"1455336016", "key0"=>"555", "key1"=>"25", "key2"=>"55525"}]

答案 1 :(得分:2)

TL; DR

snapshots.each_with_object(Hash.new {|hsh, date| hsh[date] = { "date" => date } })
  .with_index do |(snapshot, hsh), i|
    snapshot["data"].each {|datum| hsh[datum["date"]]["data#{i}"] = datum["total"] }
  end.values

如何运作

我会将其分解,以便了解每个部分的工作原理。这是我们的数据(为了清晰起见,省略了无关键):

snapshots = [
  { "dataSourceID" => "152970",
    "data" => [ { "date" => "1455672010", "total" => "817" }, 
                { "date" => "1455595298", "total" => "40" },
                { "date" => "1455336016", "total" => "555" } ]
  }
  { "dataSourceID" => "33151",
    "data" => [ { "date" => "1455672010", "total" => "70" }, 
                { "date" => "1455595298", "total" => "54" },
                { "date" => "1455336016", "total" => "25" } ]
  },
  { "dataSourceID" => "52165",
    "data" => [ { "date" => "1455672010", "total" => "70" }, 
                { "date" => "1455595298", "total" => "43212" },
                { "date" => "1455336016", "total" => "55525" } ]
  }
]

大部分魔法在这里:

result_hash = Hash.new {|hsh, date| hsh[date] = { "date" => date } }

我们在这里使用哈希default proc以下列方式自动初始化新密钥:

result_hash = Hash.new {|hsh, date| hsh[date] = { "date" => date } }
p result_hash["1455672010"]
# => { "date" => "1455672010" }

p result_hash
# => { "1455672010" => { "date" => "1455672010" } }

只需访问result_hash[foo]即可创建哈希{ "date" => foo }并将其分配给result_hash[foo]。这样可以实现以下目的:

result_hash["1455672010"]["data0"] = "817"
p result_hash
# => { "1455672010" => { "date" => "1455672010", "data0" => "817" } }

魔术!

现在假设我们有以下数据:

data = [ { "date" => "1455672010", "total" => "817" }, 
         { "date" => "1455595298", "total" => "40" },
         { "date" => "1455336016", "total" => "555" } ]

使用我们的魔法result_hash,我们可以这样做:

data.each do |datum|
  result_hash[datum["date"]]["data0"] = datum["total"]
end
p result_hash
# => { "1455672010" => { "date" => "1455672010", "data0" => "817" },
#      "1455595298" => { "date" => "1455595298", "data0" => "40" },
#      "1455336016" => { "date" => "1455336016", "data0" => "555" } }

看看我要去哪里?以下是我们的所有数据:

snapshots = [
  { "dataSourceID" => "152970",
    "data" => [ { "date" => "1455672010", "total" => "817" }, 
                { "date" => "1455595298", "total" => "40" },
                { "date" => "1455336016", "total" => "555" } ]
  }
  { "dataSourceID" => "33151",
    "data" => [ { "date" => "1455672010", "total" => "70" }, 
                { "date" => "1455595298", "total" => "54" },
                { "date" => "1455336016", "total" => "25" } ]
  },
  { "dataSourceID" => "52165",
    "data" => [ { "date" => "1455672010", "total" => "70" }, 
                { "date" => "1455595298", "total" => "43212" },
                { "date" => "1455336016", "total" => "55525" } ]
  }
]

我们可以使用"data0"迭代snapshots哈希,而不是硬编码each_with_index,然后构建该密钥("data0",然后"data1",等等)每次迭代。在该循环中,我们可以完全按照上面的操作进行操作,但是使用每个"data"哈希中的snapshots数组:

result_hash = Hash.new {|hsh, date| hsh[date] = { "date" => date } }

snapshots.each_with_index do |snapshot, i|
  data_key = "data#{i}"

  snapshot["data"].each do |datum|
    date = datum["date"]
    result_hash[date][data_key] = datum["total"]
  end
end

p result_hash.values
# => [ { "date" => "1455672010", "data0" => "817", "data1" => "70", "data2" => "70" },
#      { "date" => "1455595298", "data0" => "40",  "data1" => "54", "data2" => "43212" },
#      { "date" => "1455336016", "data0" => "555", "data1" => "25", "data2" => "55525" } ]

当然,这可以浓缩一些,我已经在上面的 TL; DR 中完成了。