如何使用行尾的分号修复CSV读取错误

时间:2019-05-22 15:01:42

标签: ruby csv

我想使用Ruby的CSV类读取文件:

要读取的文件如下:

CM_ SG_ 1325 XXX_Address "XXX address";
CM_ SG_ 612 YYY_MsgCounter "incremented by 1 each time a 
message has been transmitted";

我的红宝石代码:

#!/usr/bin/env ruby
require 'pp'
require 'csv'
CSV.foreach(ARGV[0],:col_sep=>" ") do |row|
    pp row
end

这是我得到的错误:

C:/ruby-2.3.3-x64-mingw32/lib/ruby/2.3.0/csv.rb:1898:in `block in shift': Unclosed quoted field on l
ine 1. (CSV::MalformedCSVError)
        from C:/ruby-2.3.3-x64-mingw32/lib/ruby/2.3.0/csv.rb:1805:in `loop'
        from C:/ruby-2.3.3-x64-mingw32/lib/ruby/2.3.0/csv.rb:1805:in `shift'
        from C:/ruby-2.3.3-x64-mingw32/lib/ruby/2.3.0/csv.rb:1747:in `each'
        from C:/ruby-2.3.3-x64-mingw32/lib/ruby/2.3.0/csv.rb:1131:in `block in foreach'
        from C:/ruby-2.3.3-x64-mingw32/lib/ruby/2.3.0/csv.rb:1282:in `open'
        from C:/ruby-2.3.3-x64-mingw32/lib/ruby/2.3.0/csv.rb:1130:in `foreach'
        from test.rb:4:in `<main>'

如果我删除行尾的分号,我会得到:

["CM_", "SG_", "1325", "XXX_Address", "XXX address"]
["CM_",
 "SG_",
 "612",
 "YYY_MsgCounter",
 "incremented by 1 each time a \r\nmessage has been transmitted"]

这是我期望看到的。

我假设问题是CSV不喜欢分号和引号。有没有一种方法可以使用CSV选项删除该分号,或者在我已经删除该分号的地方为CSV提供流?

说明:

很抱歉,我没有明确指定此名称,但不是每行都会有分号。

此外,我还要感谢Tin Man对我的帖子进行了多余的编辑,以提高他的得分。 ;)

2 个答案:

答案 0 :(得分:1)

由于您知道每一行都以分号结尾,因此只需指定行分隔符即可,例如

CSV.foreach(ARGV[0],col_sep:" ", row_sep:";").to_a
#=> [["CM_", "SG_", "1325", "XXX_Address", "XXX address"], 
#    ["CM_", "SG_", "612", "YYY_MsgCounter", "incremented by 1 each time a message has been transmitted"]]

您将丢失该行中的新行,不确定该行是否重要

请注意,根据我与@iGian的讨论,该解决方案适用于<2.6.0的红宝石,而他的解决方案适用于> = 2.6.0的红宝石

答案 1 :(得分:0)

尝试一下,对于Ruby 2.6.1

require 'pp'
require 'csv'

CSV.foreach(ARGV[0], col_sep: ' ', row_sep: :auto, liberal_parsing: {double_quote_outside_quote: true} ) do |row|
    pp row
end

似乎可行。看到此问题:https://github.com/ruby/csv/issues/66