我有一个“.CSV”文件,我正在尝试使用ruby中的CSV
进行解析。该文件有两行标题,我以前从未遇到过这种情况,也不知道如何处理它。下面是标题和行的示例。
第2行
"Institution ID","Institution","Game Date","Uniform Number","Last Name","First Name","Rushing","","","","","Passing","","","","","","Total Off.","","Receiving","","","Pass Int","","","Fumble Ret","","","Punting","","Punt Ret","","","KO Ret","","","Total TD","Off xpts","","","","Def xpts","","","","FG","","Saf","Points"
第2行
"","","","","","","Rushes","Gain","Loss","Net","TD","Att","Cmp","Int","Yards","TD","Conv","Plays","Yards","No.","Yards","TD","No.","Yards","TD","No.","Yards","TD","No.","Yards","No.","Yards","TD","No.","Yards","TD","","Kicks Att","Kicks Made","R/P Att","R/P Made","Kicks Att","Kicks Made","Int/Fum Att","Int/Fum Made","Att","Made"
第3行
"721","AirForce","09/01/12","19","BASKA","DAVID","","","","","","","","","","","","0","0","","","","","","","","","","2","85","","","","","","","","","","","","","","","","","","","0"
上面的示例中没有返回我刚添加它们以便更容易阅读。 CSV
是否有可用于处理此结构的方法,或者我是否必须编写自己的方法来处理此问题?谢谢!
答案 0 :(得分:8)
您的CSV文件似乎是从Excel电子表格生成的,该电子表格的列分组如下:
... | Rushing | Passing | ...
... |Rushes|Gain|Loss|Net|TD|Att|Cmp|Int|Yards|TD|Conv| ...
(不确定我是否正确恢复了组。)
没有标准工具可以使用这种类型的CSV文件AFAIK。你必须手动完成这项工作。
答案 1 :(得分:4)
我建议使用smarter_csv
gem,并手动提供正确的标题:
require 'smarter_csv'
options = {:user_provided_headers => ["Institution ID","Institution","Game Date","Uniform Number","Last Name","First Name", ... provide all headers here ... ],
:headers_in_file => false}
data = SmarterCSV.process(filename, options)
data.pop # to ignore the first header line
data.pop # to ignore the second header line
# data now contains an array of hashes with your data
请查看GitHub页面以获取选项和示例。 https://github.com/tilo/smarter_csv
您应该使用的一个选项是:user_provided_headers
,然后只需在数组中指定所需的标题即可。通过这种方式,您可以解决此类问题。
您必须data.pop
忽略文件中的标题行。
答案 2 :(得分:3)
你必须编写自己的逻辑。 CSV实际上只是行和列,并且它本身没有固有的每个列或行的含义,它只是原始数据。因此,CSV没有概念或意识到它有两个标题行,这是一个人类的东西,所以你需要建立自己的启发式。
鉴于您的数据行如下所示:
"721","Air Force","09/01/12",
当你开始解析你的数据时,如果第一列代表一个整数,那么,如果你将它转换为一个int,如果它是> 0
,那么你知道你正在处理一个有效的“行”而不是标题。
答案 3 :(得分:1)
Read the CSV in并跳过输出的第一行:
arr_of_arrs = CSV.read("path/to/file.csv")
arr_of_arrs[2..arr_of_arrs.length].each do |x|
# operation here
end
答案 4 :(得分:1)
使用CSV执行此操作非常简单。只需观察已读取的当前行号是什么,并在读取标题之前循环:
require 'csv'
CSV.foreach('test.csv') do |row|
next unless $. > 2
puts "'" + row.join("', '") + "'"
end
运行时,这就是输出:
'721', 'Air Force', '09/01/12', '19', 'BASKA', 'DAVID', '', '', '', '', '', '', '', '', '', '', '', '0', '0', '', '', '', '', '', '', '', '', '', '2', '85', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '0'
$.
是从打开的文件中读取的最后一行的行号。因此,这会立即循环,直到$.
读取两行。