我有一个名为records
的数组,有数千个哈希值(参见下面显示的第一个数组)。每个哈希当前包含两个字段id
和parent_id
。我想添加一个名为updated_at
的新字段,该字段存储在数据库中(参见下面的第二个数组)。
records = [{"id"=>3, "parent_id"=>2},
{"id"=>4, "parent_id"=>2}]
records = [{"id"=>3, "parent_id"=>2, "updated_at"=>"2014-03-21 20:44:35 UTC"},
{"id"=>4, "parent_id"=>2, "updated_at"=>"2014-03-21 20:44:34 UTC"}]
我的第一种方法是下面的方法,但是它为每个哈希执行对数据库的查询,所以如果我在数组中有1K哈希,它将执行1K查询,我认为这不是很好绩效观点。
records.each do |record|
record['updated_at'] = Record.find(record['id']).updated_at.utc.to_s
end
你能建议我一个更好的解决方案吗?
答案 0 :(得分:1)
这样的事情怎么样?通过一次聚合一个切片来批量查询。将each_slice
数量调整为表现良好的数据......
records.each_slice(250) do |records|
ids = records.map { |r| r['id'] }
results = Record.select([:id, :updated_at]).find(ids)
records.each do |rec|
result = results.find { |res| res.id == rec.id }
rec['updated_at'] = result.updated_at.utc.to_s
end
end
答案 1 :(得分:1)
这个怎么样?
plucked_records = Record.pluck(:id, :updated_at).find(records.map { |a| a.fetch("id") })
records.map! do |record|
plucked_records.each do |plucked_record|
record["updated_at"] = plucked_record.last.utc.to_s if plucked_record.first == record["id"]
end
record
end
可能有人可以更好地即兴发挥。 :)
答案 2 :(得分:0)
在做了很多基准测试并尝试不同的算法后,我想出了一个解决方案 表现得非常快,看起来它现在是最有效的。
这个想法是将结果的db记录数组转换为哈希值,所以 在哈希中查找项比在数组中查找要快得多。
结果的时间来自基准测试,使用大约4.5K哈希的数组运行。
# My last approach
# Converting the returning records Array into a Hash (thus faster searchs)
# Benchmarks average results: 0.5 seconds
ids = records.map { |rec| rec['id'] }
db_records = Record.select([:id, :updated_at]).find(ids)
hash_records = Hash[db_records.map { |r| [r.id, r.updated_at.utc.to_s] }]
records.each do |rec|
rec["updated_at"] = hash_records[rec["id"]]
end
# Original approach
# Doing a SQL query for each pair (4.5K queries against MySQL)
# Benchmarks average results: ~10 seconds
records.each do |rec|
db_rec = Record.find(pair['id'])
rec["updated_at"] = db_rec.updated_at.utc.to_s
end
# Kirti's approach (slightly improved). Thanks Kirti!
# Unfortunaly searching into a lar
# Doing a single SQL query for all the pairs (then find in the array)
# Benchmarks average results: ~18 seconds
ids = records.map { |rec| rec['id'] }
db_records = Record.select([:id, :updated_at]).find(ids)
records.each do |rec|
db_rec = db_records.find { |f| f.id == pair["id"] }
rec["updated_at"] = db_rec.updated_at.utc.to_s
end
# Nick's approach. Thanks Nick! very good solution.
# Mixed solution levering in SQL and Ruby using each_slice.
# Very interesting results:
# [slice, seconds]:
# 5000, 18.0
# 1000, 4.3
# 500, 2.6
# 250, 1.5
# 100, 1.0
# 50, 0.9 <- :)
# 25, 1.0
# 10, 1.8
# 5, 2.3
# 1, 10.0
# Optimal slice value is 50 elements! (for this scenario)
# An scenario with a much costly SQL query might require a higher slice number
slice = 50
records.each_slice(slice) do |recs|
ids = recs.map { |pair| pair['id'] }
db_records = Record.select([:id, :updated_at]).find(ids)
recs.each do |rec|
db_rec = db_records.find { |f| f.id == rec["id"] }
rec["updated_at"] = db_rec.updated_at.utc.to_s
end
end