Question

我想从CSV文件中的单个字段中提取特定值，但我所做的任何研究都指向使用哈希来提取整列数据而不是值。

Name,Times arrived,Total $ spent,Food feedback
Dan,34,2548,Lovin it!
Maria,55,5054,"Good, delicious food"
Carlos,22,4352,"I am ""pleased"", but could be better"
Stephany,34,6542,I want bigger steaks!!!!!

例如，我希望提取值2548和4352以添加并合并为新行的总计。

我用过：

CSV.foreach("file.csv") { |row| col_data_new << row[5] }

将列中的值提取到数组中，但这次我只想要一个值。

Answer 1

是的，哈希是要走的路：

require 'csv'

data = 'Name,Times arrived,Total $ spent,Food feedback
Dan,34,2548,Lovin it!
Maria,55,5054,"Good, delicious food"
Carlos,22,4352,"I am ""pleased"", but could be better"
Stephany,34,6542,I want bigger steaks!!!!!
'

CSV.parse(data, headers: :first_row).map{ |row| row["Total $ spent"] }
# => ["2548", "5054", "4352", "6542"]

假装

CSV.parse(data, headers: :first_row)

真的是

CSV.foreach('some/file.csv', headers: :first_row)

并且数据确实存在于文件中。

您要使用headers: :first_row的原因是告诉CSV吞噬第一行。然后，它将使用关键的标题字段为每个记录返回一个哈希值，从而更容易按名称检索特定字段。

来自the documentation：

:headers

如果设置为：first_row或true，则CSV文件的初始行将被视为一行标题。

执行此操作的替代方法是：

spent = CSV.parse(data).map{ |row| row[2] }
spent.shift

spent
# => ["2548", "5054", "4352", "6542"]

spent.shift删除数组中的第一个元素，即该列的标题字段，使数组只包含值。

或者：

spent = []
skip_headers = true
CSV.parse(data).each do |row| 

  if skip_headers
    skip_headers = false
    next
  end

  spent << row[2]
end

spent
# => ["2548", "5054", "4352", "6542"]

与上面的shift语句类似，next告诉Ruby跳转到循环的下一次迭代，不处理块中的其余指令，这会导致头记录在最终输出中被跳过。

获得所需字段的值后，您可以有选择地提取特定字段。如果你想要价值观＆＃34; 2548＆＃34;和＆＃34; 4352＆＃34;，你必须有一种方法来确定它们所在的行。使用数组（非标题方法）使它更难以做到，所以我使用哈希来做再次：

spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary| 
  case row['Name']
  when 'Dan', 'Carlos'
    ary << row['Total $ spent']
  end
end

spent
# => ["2548", "4352"]

请注意，很明显发生了什么，这在代码中很重要。使用case和when，我可以轻松添加要包含的其他名称。那个行为就像链接的＆＃34;或＆＃34;对if语句进行条件测试，但没有额外的噪音。

each_with_object与inject类似，只是当我们需要将值聚合到Array，Hash或某个对象中时它更清晰。

总结数组很简单，有很多不同的方法可以实现，但我会使用：

spent.map(&:to_i).inject(:+) # => 6900

基本上，将单个元素转换为整数并将它们一起添加。（除此之外还有更多内容，但是直到你的学习曲线越来越重要。）

我只是想知道是否可以替换＆＃39;当＆＃39;带有字符串数组的条件迭代而不是硬编码字符串？

这是使用数组的解决方案：

NAMES = %w[Dan Carlos]

spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary| 
  case row['Name']
  when *NAMES
    ary << row['Total $ spent']
  end
end

spent
# => ["2548", "4352"]

如果名称列表很大，我认为此解决方案将比必要的运行速度慢。数组非常适合存储您将要访问的数据，作为队列，或者像堆栈一样记住它们的顺序，但是当您必须只是为了找到某些内容时，它们会很糟糕。即使是排序的数组并使用二进制搜索也可能比使用Hash慢，因为使用它们需要额外的步骤。这是另一种方法，但使用哈希：

NAMES = %w[Dan Carlos].map{ |n| [n, true] }.to_h

spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary| 
  case
  when NAMES[row['Name']]
    ary << row['Total $ spent']
  end
end

spent
# => ["2548", "4352"]

但这可以重构为更具可读性：

NAMES = %w[Dan Carlos].each_with_object({}) { |a, h| h[a] = true }
# => {"Dan"=>true, "Carlos"=>true}

spent = CSV.parse(data, headers: :first_row).each_with_object([]) do |row, ary| 
  ary << row['Total $ spent'] if NAMES[row['Name']]
end

spent
# => ["2548", "4352"]

从CSV文件中选择单个值字段

1 个答案: