我正在处理一个包含多个哈希数组的大型数据集,它们都有一个共同的键值对(“日期”和日期值)作为哈希的第一个元素。
我需要解析的哈希数组(@data [“snapshot”])采用以下格式。请注意@data [“snapshot”] [0],@ data [“snapshot”] [1]和@data [“snapshot”] [2]采用完全相同的格式,日期相同但他们的总数不同。在生成的哈希中,我需要有一个键值对,用于标识数据的来源。
@data [“snapshot”] [0]如下:
[{"date"=>"1455672010", "total"=>"**817**", "I"=>"1", "L"=>"3", "M"=>"62", "H"=>"5", "C"=>"0"},
{"date"=>"1455595298", "total"=>"**40**", "I"=>"8", "L"=>"5", "M"=>"562", "H"=>"125", "C"=>"0"},
{"date"=>"1455336016", "total"=>"**555**", "I"=>"10", "L"=>"1", "M"=>"93", "H"=>"121", "C"=>"0"}]
@data [“snapshot”] [1]如下:
[{"date"=>"1455672010", "total"=>"**70**", "I"=>"1", "L"=>"9", "M"=>"56", "H"=>"25", "C"=>"0"},
{"date"=>"1455595298", "total"=>"**54**", "I"=>"8", "L"=>"2", "M"=>"5", "H"=>"5", "C"=>"0"},
{"date"=>"1455336016", "total"=>"**25**", "I"=>"0", "L"=>"9", "M"=>"93", "H"=>"12", "C"=>"0"}]
@data [“snapshot”] [2]如下:
[{"date"=>"1455672010", "total"=>"**70**", "I"=>"12", "L"=>"5", "M"=>"5662", "H"=>"125", "C"=>"0"},
{"date"=>"1455595298", "total"=>"**43212**", "I"=>"56", "L"=>"6", "M"=>"5662", "H"=>"125", "C"=>"0"},
{"date"=>"1455336016", "total"=>"**55525**", "I"=>"100", "L"=>"19", "M"=>"5593", "H"=>"121", "C"=>"0"}]
我的问题归根结底:
如何转换(展平?)现有的3个哈希数组(@data [“snapshot”] [0],@ dat [“snapshot”] [1]和@data [“snapshot”] [2 ])采用以下格式的单个哈希数组?
[{"date"=>"1455672010", "CameFromDataSource0"=>"817", "CameFromDataSource1"=>"70", "CameFromDataSource2"=>"70"},
{"date"=>"1455595298", "CameFromDataSource0"=>"40", "CameFromDataSource1"=>"54", "CameFromDataSource2"=>"43212"},
{"date"=>"1455336016", "CameFromDataSource0"=>"555", "CameFromDataSource1"=>"25", "CameFromDataSource2"=>"55525"}]
答案 0 :(得分:2)
这是一种方法。
<强>代码强>
def convert(data)
data.each_with_object({}) { |a,h|
a.each { |g| h.update(g["date"]=>[g["total"][/\d+/]]) { |_,o,n| o+n } } }.
map { |date, arr| arr.each_with_index.with_object({"date"=>date}) { |(e,i),h|
h["key#{i}"] = e } }
end
示例强>
convert(data)
#=> [{"date"=>"1455672010", "key0"=>"817", "key1"=>"70", "key2"=>"70"},
# {"date"=>"1455595298", "key0"=>"40", "key1"=>"54", "key2"=>"43212"},
# {"date"=>"1455336016", "key0"=>"555", "key1"=>"25", "key2"=>"55525"}]
两个步骤
您可以看到我已经分两步完成了这项工作。首先构造一个哈希:
f = data.each_with_object({}) { |a,h| a.each { |g|
h.update(g["date"]=>[g["total"][/\d+/]]) { |_,o,n| o+n } } }
#=> {"1455672010"=>["817", "70", "70"],
# "1455595298"=>["40", "54", "43212"],
# "1455336016"=>["555", "25", "55525"]}
这里我使用Hash#update(又名merge!
)的形式,它使用一个块({ |_,o,n| o+n }
)来确定两个哈希值中合并的键值。< / p>
然后将散列转换为所需的格式:
f.map { |date, arr| arr.each_with_index.with_object({"date"=>date}) { |(e,i),h|
h["key#{i}"] = e } }
#=> [{"date"=>"1455672010", "key0"=>"817", "key1"=>"70", "key2"=>"70"},
# {"date"=>"1455595298", "key0"=>"40", "key1"=>"54", "key2"=>"43212"},
# {"date"=>"1455336016", "key0"=>"555", "key1"=>"25", "key2"=>"55525"}]
答案 1 :(得分:2)
snapshots.each_with_object(Hash.new {|hsh, date| hsh[date] = { "date" => date } })
.with_index do |(snapshot, hsh), i|
snapshot["data"].each {|datum| hsh[datum["date"]]["data#{i}"] = datum["total"] }
end.values
我会将其分解,以便了解每个部分的工作原理。这是我们的数据(为了清晰起见,省略了无关键):
snapshots = [
{ "dataSourceID" => "152970",
"data" => [ { "date" => "1455672010", "total" => "817" },
{ "date" => "1455595298", "total" => "40" },
{ "date" => "1455336016", "total" => "555" } ]
}
{ "dataSourceID" => "33151",
"data" => [ { "date" => "1455672010", "total" => "70" },
{ "date" => "1455595298", "total" => "54" },
{ "date" => "1455336016", "total" => "25" } ]
},
{ "dataSourceID" => "52165",
"data" => [ { "date" => "1455672010", "total" => "70" },
{ "date" => "1455595298", "total" => "43212" },
{ "date" => "1455336016", "total" => "55525" } ]
}
]
大部分魔法在这里:
result_hash = Hash.new {|hsh, date| hsh[date] = { "date" => date } }
我们在这里使用哈希default proc以下列方式自动初始化新密钥:
result_hash = Hash.new {|hsh, date| hsh[date] = { "date" => date } }
p result_hash["1455672010"]
# => { "date" => "1455672010" }
p result_hash
# => { "1455672010" => { "date" => "1455672010" } }
只需访问result_hash[foo]
即可创建哈希{ "date" => foo }
并将其分配给result_hash[foo]
。这样可以实现以下目的:
result_hash["1455672010"]["data0"] = "817"
p result_hash
# => { "1455672010" => { "date" => "1455672010", "data0" => "817" } }
魔术!
现在假设我们有以下数据:
data = [ { "date" => "1455672010", "total" => "817" },
{ "date" => "1455595298", "total" => "40" },
{ "date" => "1455336016", "total" => "555" } ]
使用我们的魔法result_hash
,我们可以这样做:
data.each do |datum|
result_hash[datum["date"]]["data0"] = datum["total"]
end
p result_hash
# => { "1455672010" => { "date" => "1455672010", "data0" => "817" },
# "1455595298" => { "date" => "1455595298", "data0" => "40" },
# "1455336016" => { "date" => "1455336016", "data0" => "555" } }
看看我要去哪里?以下是我们的所有数据:
snapshots = [
{ "dataSourceID" => "152970",
"data" => [ { "date" => "1455672010", "total" => "817" },
{ "date" => "1455595298", "total" => "40" },
{ "date" => "1455336016", "total" => "555" } ]
}
{ "dataSourceID" => "33151",
"data" => [ { "date" => "1455672010", "total" => "70" },
{ "date" => "1455595298", "total" => "54" },
{ "date" => "1455336016", "total" => "25" } ]
},
{ "dataSourceID" => "52165",
"data" => [ { "date" => "1455672010", "total" => "70" },
{ "date" => "1455595298", "total" => "43212" },
{ "date" => "1455336016", "total" => "55525" } ]
}
]
我们可以使用"data0"
迭代snapshots
哈希,而不是硬编码each_with_index
,然后构建该密钥("data0"
,然后"data1"
,等等)每次迭代。在该循环中,我们可以完全按照上面的操作进行操作,但是使用每个"data"
哈希中的snapshots
数组:
result_hash = Hash.new {|hsh, date| hsh[date] = { "date" => date } }
snapshots.each_with_index do |snapshot, i|
data_key = "data#{i}"
snapshot["data"].each do |datum|
date = datum["date"]
result_hash[date][data_key] = datum["total"]
end
end
p result_hash.values
# => [ { "date" => "1455672010", "data0" => "817", "data1" => "70", "data2" => "70" },
# { "date" => "1455595298", "data0" => "40", "data1" => "54", "data2" => "43212" },
# { "date" => "1455336016", "data0" => "555", "data1" => "25", "data2" => "55525" } ]
当然,这可以浓缩一些,我已经在上面的 TL; DR 中完成了。