Ruby:如何在Ruby中读取包含两个头文件的CSV文件?

时间:2013-06-06 00:39:43

标签: ruby-on-rails ruby parsing csv

我有一个“.CSV”文件,我正在尝试使用ruby中的CSV进行解析。该文件有两行标题,我以前从未遇到过这种情况,也不知道如何处理它。下面是标题和行的示例。

第2行

"Institution ID","Institution","Game Date","Uniform Number","Last Name","First Name","Rushing","","","","","Passing","","","","","","Total Off.","","Receiving","","","Pass Int","","","Fumble Ret","","","Punting","","Punt Ret","","","KO Ret","","","Total TD","Off xpts","","","","Def xpts","","","","FG","","Saf","Points"

第2行

"","","","","","","Rushes","Gain","Loss","Net","TD","Att","Cmp","Int","Yards","TD","Conv","Plays","Yards","No.","Yards","TD","No.","Yards","TD","No.","Yards","TD","No.","Yards","No.","Yards","TD","No.","Yards","TD","","Kicks Att","Kicks Made","R/P Att","R/P Made","Kicks Att","Kicks Made","Int/Fum Att","Int/Fum Made","Att","Made"

第3行

"721","AirForce","09/01/12","19","BASKA","DAVID","","","","","","","","","","","","0","0","","","","","","","","","","2","85","","","","","","","","","","","","","","","","","","","0"

上面的示例中没有返回我刚添加它们以便更容易阅读。 CSV是否有可用于处理此结构的方法,或者我是否必须编写自己的方法来处理此问题?谢谢!

5 个答案:

答案 0 :(得分:8)

您的CSV文件似乎是从Excel电子表格生成的,该电子表格的列分组如下:

... |        Rushing        |         Passing         | ...
... |Rushes|Gain|Loss|Net|TD|Att|Cmp|Int|Yards|TD|Conv| ...

(不确定我是否正确恢复了组。)

没有标准工具可以使用这种类型的CSV文件AFAIK。你必须手动完成这项工作。

  • 阅读第一行,将其视为第一个标题行。
  • 阅读第二行,将其视为第二个标题行。
  • 阅读第三行,将其视为第一条数据线。
  • ...

答案 1 :(得分:4)

我建议使用smarter_csv gem,并手动提供正确的标题:

 require 'smarter_csv'
 options = {:user_provided_headers => ["Institution ID","Institution","Game Date","Uniform Number","Last Name","First Name", ... provide all headers here ... ], 
            :headers_in_file => false}
 data = SmarterCSV.process(filename, options)
 data.pop # to ignore the first header line
 data.pop # to ignore the second header line
 # data now contains an array of hashes with your data

请查看GitHub页面以获取选项和示例。 https://github.com/tilo/smarter_csv

您应该使用的一个选项是:user_provided_headers,然后只需在数组中指定所需的标题即可。通过这种方式,您可以解决此类问题。

您必须data.pop忽略文件中的标题行。

答案 2 :(得分:3)

你必须编写自己的逻辑。 CSV实际上只是行和列,并且它本身没有固有的每个列或行的含义,它只是原始数据。因此,CSV没有概念或意识到它有两个标题行,这是一个人类的东西,所以你需要建立自己的启发式。

鉴于您的数据行如下所示:

"721","Air Force","09/01/12",

当你开始解析你的数据时,如果第一列代表一个整数,那么,如果你将它转换为一个int,如果它是> 0,那么你知道你正在处理一个有效的“行”而不是标题。

答案 3 :(得分:1)

Read the CSV in并跳过输出的第一行:

arr_of_arrs = CSV.read("path/to/file.csv")
arr_of_arrs[2..arr_of_arrs.length].each do |x|
   # operation here
end

答案 4 :(得分:1)

使用CSV执行此操作非常简单。只需观察已读取的当前行号是什么,并在读取标题之前循环:

require 'csv'

CSV.foreach('test.csv') do |row|
  next unless $. > 2
  puts "'" + row.join("', '") + "'"
end

运行时,这就是输出:

'721', 'Air Force', '09/01/12', '19', 'BASKA', 'DAVID', '', '', '', '', '', '', '', '', '', '', '', '0', '0', '', '', '', '', '', '', '', '', '', '2', '85', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '0'

$.是从打开的文件中读取的最后一行的行号。因此,这会立即循环,直到$.读取两行。