Question

更新：我最初覆盖了哈希键，但后来解决了这个问题。感谢大家到目前为止的输入。

现在的问题是迭代花费数小时来生成数据：

客户csv 22,000 行。

光纤csv具有 170,000 行。

fiber = CSV.read("fiber.csv", {headers: true})
customers = CSV.read("customers.csv", {headers: true})

hh = Hash.new { |hsh,key| hsh[key] = [] }

#for each customer, loop through all the fiber coords
customers.each do |c|
  fiber.each do |f|
    hh[customer["cid"]].push Haversine.distance(c["lat"], c["lng"], f["lat"], f["lng"])
  end
end

vals = hh.map { |k, v| v.min } #returns the minimum value per row (which I want)

由于我想在程序/命令行之外使用这些值，我认为写一个CSV会是一个好的方法（欢迎其他建议）。

但是，由于上面的嵌套循环需要几个小时才能运行而没有完成，这不是一个理想的方法。

CSV.open("hash_output.csv", "wb") {|csv| vals.each {|elem| csv << [elem]} }

有关如何加快此过程的任何想法？

Answer 1

我认为问题在于你用每个循环覆盖你的名字空间。我会做这样的事情：

hh = Hash.new { |hsh,key| hsh[key] = [] }
#for each customer, loop through all the fiber coords
customers.each do |c|      
  fiber.each do |f|
    hh[c["last Name"]].push Haversine.distance(c["lat"], c["lng"], f["lat"], f["lng"])
  end
end

这样，密钥将成为客户的姓氏，值将是一系列距离。因此，结果数据结构如下所示：

{ 
   "DOE" => [922224.16, 920129.46, 919214.42],
   ...
}

具有多个值的Ruby哈希密钥：及时返回最小值

1 个答案: