如何使用Ruby中的CSV文件中的逗号进行解析

时间:2017-08-19 20:40:43

标签: ruby-on-rails ruby csv

我正在使用Ruby解析CSV文件,但是我遇到的问题是分隔符是逗号,我的数据包含逗号。

在包含逗号的部分数据中,数据被""包围。但我不知道如何使CSV忽略引号中包含的逗号。

示例CSV数据(File.csv)

NCB 14591  BLK 13  LOT W IRR," 84.07 FT OF 25, ALL OF 26,",TWENTY-THREE SAC HOLDING COR

示例代码:

require 'csv'
CSV.foreach("File.csv", encoding:'iso-8859-1:utf-8', :quote_char => "\x00").each do |x|
  puts x[1]
end
  

电流输出:" 84.07 FT OF 25

     

预期产出:84.07金融时报25,全部26,

链接到要点以查看示例文件和代码。 https://gist.github.com/markscoin/0d6c2d346d70fd627203317c5fe3097c

2 个答案:

答案 0 :(得分:2)

尝试使用force_quotes选项:

require 'csv'
CSV.foreach("data.csv", encoding:'iso-8859-1:utf-8', quote_char: '"', force_quotes: true).each do |x|
  puts x[1]
end

结果:

  

84.07英国夏令时25,全部26,

答案 1 :(得分:0)

非法引用错误是指一行有引号,但它们不会包裹整个列,例如,如果您的CSV看起来像:

NCB 14591  BLK 13  LOT W IRR," 84.07 FT OF 25, ALL OF 26,",TWENTY-THREE SAC HOLDING COR
NCB 14592  BLK 14  LOT W IRR,84.07 FT OF "25",TWENTY-FOUR SAC HOLDING COR

您可以单独解析每一行,并仅针对使用错误引用的行更改引号字符:

require 'csv'

def parse_file(file_name)
  File.foreach(file_name) do |line|
    parse_line(line) do |x|
      puts x.inspect
    end
  end
end

def parse_line(line)
  options = { encoding:'iso-8859-1:utf-8' }
  begin
    yield CSV.parse_line(line, options)
  rescue CSV::MalformedCSVError
    # this line is misusing quotes, change the quote character and try again
    options.merge! quote_char: "\x00"

    retry
  end
end

parse_file('./File.csv')

并运行它会给你:

["NCB 14591  BLK 13  LOT W IRR", " 84.07 FT OF 25, ALL OF 26,", "TWENTY-THREE SAC HOLDING COR"]
["NCB 14592  BLK 14  LOT W IRR", "84.07 FT OF \"25\"", "TWENTY-FOUR SAC HOLDING COR"]

但是如果你在一行中混合了不好的引用和良好的引用,那么这又会再次崩溃。理想情况下,您只想将CSV清理为有效。