我正在使用Ruby解析CSV文件,但是我遇到的问题是分隔符是逗号,我的数据包含逗号。
在包含逗号的部分数据中,数据被""包围。但我不知道如何使CSV忽略引号中包含的逗号。
示例CSV数据(File.csv)
NCB 14591 BLK 13 LOT W IRR," 84.07 FT OF 25, ALL OF 26,",TWENTY-THREE SAC HOLDING COR
示例代码:
require 'csv'
CSV.foreach("File.csv", encoding:'iso-8859-1:utf-8', :quote_char => "\x00").each do |x|
puts x[1]
end
电流输出:" 84.07 FT OF 25
预期产出:84.07金融时报25,全部26,
链接到要点以查看示例文件和代码。 https://gist.github.com/markscoin/0d6c2d346d70fd627203317c5fe3097c
答案 0 :(得分:2)
尝试使用force_quotes选项:
require 'csv'
CSV.foreach("data.csv", encoding:'iso-8859-1:utf-8', quote_char: '"', force_quotes: true).each do |x|
puts x[1]
end
结果:
84.07英国夏令时25,全部26,
答案 1 :(得分:0)
非法引用错误是指一行有引号,但它们不会包裹整个列,例如,如果您的CSV看起来像:
NCB 14591 BLK 13 LOT W IRR," 84.07 FT OF 25, ALL OF 26,",TWENTY-THREE SAC HOLDING COR
NCB 14592 BLK 14 LOT W IRR,84.07 FT OF "25",TWENTY-FOUR SAC HOLDING COR
您可以单独解析每一行,并仅针对使用错误引用的行更改引号字符:
require 'csv'
def parse_file(file_name)
File.foreach(file_name) do |line|
parse_line(line) do |x|
puts x.inspect
end
end
end
def parse_line(line)
options = { encoding:'iso-8859-1:utf-8' }
begin
yield CSV.parse_line(line, options)
rescue CSV::MalformedCSVError
# this line is misusing quotes, change the quote character and try again
options.merge! quote_char: "\x00"
retry
end
end
parse_file('./File.csv')
并运行它会给你:
["NCB 14591 BLK 13 LOT W IRR", " 84.07 FT OF 25, ALL OF 26,", "TWENTY-THREE SAC HOLDING COR"]
["NCB 14592 BLK 14 LOT W IRR", "84.07 FT OF \"25\"", "TWENTY-FOUR SAC HOLDING COR"]
但是如果你在一行中混合了不好的引用和良好的引用,那么这又会再次崩溃。理想情况下,您只想将CSV清理为有效。