Question

我在CSV文件中有以下行，在解析时会给我带来问题：

312,'997639',' 2','John, Doe. "J.D." ',' ','2000 ',' ','Street ','City ','NY','99999','','2010-02-17 19:12:04','2010-02-17 19:12:04';

我正在使用以下参数进行解析：

FasterCSV.foreach(file, {:headers => true, :quote_char => '"', :col_sep => "','"} ) do |row|

然而，由于行列中的“J.D”，它正在像上面的行一样爆炸。如何使用FasterCSV正确解析该行？

谢谢！

Answer 1

我认为您的:quote_char应该是'而:col_sep应该是,。在那种情况下：

FasterCSV.foreach(file, {:headers => true, :quote_char => "'", :col_sep => ','} ) ...

Answer 2

你做不到。 FasterCSV只允许选择一个引号字符，而您的应用程序需要两个。没有办法做一些可爱的东西，如传递正则表达式而不是字符，因为FasterCSV预编译匹配引号字符的匹配器，如下所示：

# prebuild Regexps for faster parsing
esc_col_sep = Regexp.escape(@col_sep)
esc_row_sep = Regexp.escape(@row_sep)
esc_quote   = Regexp.escape(@quote_char)
@parsers = {
  :any_field      => Regexp.new( "[^#{esc_col_sep}]+",
                                 Regexp::MULTILINE,
                                 @encoding ),
  :quoted_field   => Regexp.new( "^#{esc_quote}(.*)#{esc_quote}$",
                                 Regexp::MULTILINE,
                                 @encoding ),
  ...
}

Answer 3

我无法弯曲FasterCSV以便按照我需要的方式使用这些数据，因此最终结果只是请求使用正确的CSV输出进行新的数据转储。谢谢你的尝试！

用FasterCSV解析这一行的正确方法是什么？

3 个答案: