使用数据库中的记录更新哈希数组,在每个现有哈希中添加一个新字段

时间:2014-03-21 21:14:36

标签: ruby arrays performance activerecord hash

我有一个名为records的数组,有数千个哈希值(参见下面显示的第一个数组)。每个哈希当前包含两个字段idparent_id。我想添加一个名为updated_at的新字段,该字段存储在数据库中(参见下面的第二个数组)。

records = [{"id"=>3, "parent_id"=>2}, 
           {"id"=>4, "parent_id"=>2}]

records = [{"id"=>3, "parent_id"=>2, "updated_at"=>"2014-03-21 20:44:35 UTC"}, 
           {"id"=>4, "parent_id"=>2, "updated_at"=>"2014-03-21 20:44:34 UTC"}] 

我的第一种方法是下面的方法,但是它为每个哈希执行对数据库的查询,所以如果我在数组中有1K哈希,它将执行1K查询,我认为这不是很好绩效观点。

records.each do |record|
  record['updated_at'] = Record.find(record['id']).updated_at.utc.to_s
end

你能建议我一个更好的解决方案吗?

3 个答案:

答案 0 :(得分:1)

这样的事情怎么样?通过一次聚合一个切片来批量查询。将each_slice数量调整为表现良好的数据......

records.each_slice(250) do |records|
  ids = records.map { |r| r['id'] }
  results = Record.select([:id, :updated_at]).find(ids)
  records.each do |rec|
    result = results.find { |res| res.id == rec.id }
    rec['updated_at'] = result.updated_at.utc.to_s
  end
end

答案 1 :(得分:1)

这个怎么样?

plucked_records = Record.pluck(:id, :updated_at).find(records.map { |a| a.fetch("id") })

records.map! do |record|
  plucked_records.each do |plucked_record|
    record["updated_at"] = plucked_record.last.utc.to_s if plucked_record.first == record["id"]
  end
  record
end

可能有人可以更好地即兴发挥。 :)

答案 2 :(得分:0)

在做了很多基准测试并尝试不同的算法后,我想出了一个解决方案 表现得非常快,看起来它现在是最有效的。

这个想法是将结果的db记录数组转换为哈希值,所以 在哈希中查找项比在数组中查找要快得多。

结果的时间来自基准测试,使用大约4.5K哈希的数组运行。

# My last approach
# Converting the returning records Array into a Hash (thus faster searchs)
# Benchmarks average results: 0.5 seconds
ids = records.map { |rec| rec['id'] }
db_records = Record.select([:id, :updated_at]).find(ids)
hash_records = Hash[db_records.map { |r| [r.id, r.updated_at.utc.to_s] }]
records.each do |rec|
  rec["updated_at"] = hash_records[rec["id"]]
end

# Original approach
# Doing a SQL query for each pair (4.5K queries against MySQL)
# Benchmarks average results: ~10 seconds
records.each do |rec|
  db_rec = Record.find(pair['id'])
  rec["updated_at"] = db_rec.updated_at.utc.to_s
end

# Kirti's approach (slightly improved). Thanks Kirti! 
# Unfortunaly searching into a lar
# Doing a single SQL query for all the pairs (then find in the array)
# Benchmarks average results: ~18 seconds
ids = records.map { |rec| rec['id'] }
db_records = Record.select([:id, :updated_at]).find(ids)
records.each do |rec|
  db_rec = db_records.find { |f| f.id == pair["id"] }
  rec["updated_at"] = db_rec.updated_at.utc.to_s
end  

# Nick's approach. Thanks Nick! very good solution.
# Mixed solution levering in SQL and Ruby using each_slice.
# Very interesting results:
# [slice, seconds]:
# 5000, 18.0 
# 1000, 4.3
#  500, 2.6
#  250, 1.5
#  100, 1.0
#   50, 0.9 <- :)
#   25, 1.0
#   10, 1.8
#    5, 2.3
#    1, 10.0
# Optimal slice value is 50 elements! (for this scenario)
# An scenario with a much costly SQL query might require a higher slice number
slice = 50
records.each_slice(slice) do |recs|
  ids = recs.map { |pair| pair['id'] }
  db_records = Record.select([:id, :updated_at]).find(ids)
  recs.each do |rec|
    db_rec = db_records.find { |f| f.id == rec["id"] }
    rec["updated_at"] = db_rec.updated_at.utc.to_s
  end
end