Question

我有多个csv文件，其中包含产品的名称和价格。两个文件中可能存在或可能不存在产品。我必须为每种产品找到这些文件的最高价和最低价。

我将两个文件中的产品加入到一个数组中：

Dir["./*.csv"].each do |file|
  CSV.foreach(file, headers:true) do |row|
    tmpRow = row.to_s.chomp + "," + file #saving name of the input file
    list.push(tmpRow.chomp.split(","))
  end
end

数组list如下所示：

[["5893105","2.38", "weightOrSomethingIrrelevant", "./FIAT_2.csv"]]

这是主要的算法：

while list[0] do
  if list[0] != nil
    tmpPart = list[0][0]
    tmpParts = list.select{ |part, price| part == tmpPart}
    tmpParts.each do |tp|
      tmpPrices.push(tp[1])
    end
    list[0][2].to_f != 0.0 ? tmpWeight = list[0][2].to_s : tmpWeight = "Undefined"
    tmpMaxPrice = tmpParts.select{|part, price| part == tmpPart && price == tmpPrices.max}
    tmpMinPrice = tmpParts.select{|part, price| part == tmpPart && price == tmpPrices.min}
    result.push([tmpPart, tmpWeight, tmpPrices.max, tmpMaxPrice[0].last, tmpPrices.min, tmpMinPrice[0].last)
    tmpPart = ""
    list = list - tmpParts
    tmpParts = []
    tmpPrices = []
    tmpMaxPrice = []
    tmpMinPrice = []
    tmpWeight = ""
  end
end

输入文件很大（超过20万行），所以我的算法效率有问题（因为它在半秒内处理一行）。

我想知道是否有更好的方法来编写这个应用程序。

Answer 1

我会把它分成几个部分： 1）我建议你有一个代表文件的表（文件名，位置，行号等）并连接到产品表（该文件的行数据） 2）脚本/函数来摄取文件并将行存储为DB记录 3）脚本/功能，用于分析行并按名称查找产品，使用数据库并使用最小值/最大值拉出价格信息。

稍后可以对此进行改进，以处理命名不一致产品与产品出现等问题。

如何加快ruby中的数组迭代速度

1 个答案: