Ruby无法解析CSV文件:CSV :: MalformedCSVError(第1行中的非法引用)

时间:2013-05-27 12:04:07

标签: ruby csv malformed

Ubuntu 12.04 LTS

Ruby ruby​​ 1.9.3dev(2011-09-23修订版33323)[i686-linux]

Rails 3.2.9

以下是我收到的CSV文件的内容:

"date/time","settlement id","type","order id","sku","description","quantity","marketplace","fulfillment","order city","order state","order postal","product sales","shipping credits","gift wrap credits","promotional rebates","sales tax collected","selling fees","fba fees","other transaction fees","other","total"
"Mar 1, 2013 12:03:54 AM PST","5481545091","Order","108-0938567-7009852","ALS2GL36LED","Solar Two Directional 36 Bright White LED Security Flood Light with Motion Activated Sensor","1","amazon.com","Amazon","Pasadena","CA","91104-1056","43.00","3.25","0","-3.25","0","-6.45","-3.75","0","0","32.80"

但是,当我尝试解析CSV文件时,我收到错误:

1.9.3dev :016 > options = { col_sep: ",", quote_char:'"' }
=> {:col_sep=>",", :quote_char=>"\""} 

1.9.3dev :022 > CSV.foreach("/tmp/my_data.csv", options) { |row| puts row }
CSV::MalformedCSVError: Illegal quoting in line 1.
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `each'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `block in shift'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `loop'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `shift'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1791:in `each'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1208:in `block in foreach'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1354:in `open'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1207:in `foreach'
    from (irb):22
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/bin/irb:16:in `<main>'

然后我尝试简化数据,即

"name","age","email"
"jignesh","30","jignesh@example.com"

然而我仍然得到同样的错误:

      1.9.3dev :023 > CSV.foreach("/tmp/my_data.csv", options) { |row| puts row }
  CSV::MalformedCSVError: Illegal quoting in line 1.
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `each'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `block in shift'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `loop'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `shift'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1791:in `each'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1208:in `block in foreach'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1354:in `open'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1207:in `foreach'
      from (irb):23
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/bin/irb:16:in `<main>'

我再次尝试简化这样的数据:

name,age,email
jignesh,30,jignesh@example.com

并且它有效。参见下面的输出:

  1.9.3dev :024 > CSV.foreach("/tmp/my_data.csv") { |row| puts row }
  name
  age
  email
  jignesh
  30
  jignesh@example.com
   => nil 

但我会收到带有引用数据的CSV文件,因此删除引号解决方案实际上并不是我正在寻找。我无法弄清楚导致错误的原因: CSV :: MalformedCSVError:非法引用行1。在我之前的例子中。

我已经通过在文本编辑器中启用“显示空白字符”和“显示行结尾”来验证在CSV中没有前导/尾随空格。我还使用以下内容验证了编码。

  1.9.3dev :026 > File.open("/tmp/my_data.csv").read.encoding
  => #<Encoding:UTF-8> 

注意:我也尝试使用CSV.read,但该方法也出现了同样的错误。

有人可以帮我解决问题并让我明白哪里出错了吗?

=====================

我刚发现以下帖子:http://www.ruby-forum.com/topic/448070并尝试了以下内容:

  file_data = file.read
  file_data.gsub!('"', "'")
  arr_of_arrs = CSV.parse(file_data)

  arr_of_arrs.each do |arr|
    Rails.logger.debug "=======#{arr}"
  end

并获得以下输出:

   =======["\xEF\xBB\xBF'date/time'", "'settlement id'", "'type'", "'order id'", "'sku'", "'description'", "'quantity'", "'marketplace'", "'fulfillment'", "'order city'", "'order state'", "'order postal'", "'product sales'", "'shipping credits'", "'gift wrap credits'", "'promotional rebates'", "'sales tax collected'", "'selling fees'", "'fba fees'", "'other transaction fees'", "'other'", "'total'"]
    =======["'Mar 1", " 2013 12:03:54 AM PST'", "'5481545091'", "'Order'", "'108-0938567-7009852'", "'ALS2GL36LED'", "'Solar Two Directional 36 Bright White LED Security Flood Light with Motion Activated Sensor'", "'1'", "'amazon.com'", "'Amazon'", "'Pasadena'", "'CA'", "'91104-1056'", "'43.00'", "'3.25'", "'0'", "'-3.25'", "'0'", "'-6.45'", "'-3.75'", "'0'", "'0'", "'32.80'"]

由于使用的默认 col_sep 是一个逗号字符,因此搞乱了正确读取数据。 但是我尝试使用 quote_char 选项,如下所示:

  arr_of_arrs = CSV.parse(file_data, :quote_char => "'")

但它最终出现以下错误:

   CSV::MalformedCSVError (Illegal quoting in line 1.):

谢谢, Jignesh

10 个答案:

答案 0 :(得分:21)

quote_chars = %w(" | ~ ^ & *)
begin
  @report = CSV.read(csv_file, headers: :first_row, quote_char: quote_chars.shift)
rescue CSV::MalformedCSVError
  quote_chars.empty? ? raise : retry 
end

它并不完美,但它大部分时间都有效。

N.B。 CSV.parse采用与CSV.read相同的参数,因此可以使用文件或内存中的数据

答案 1 :(得分:15)

Anand,谢谢你的编码建议。这解决了我的非法引用问题。

注意:如果您希望迭代器跳过标题行,请添加headers: :first_row,如下所示:

CSV.foreach("test.csv", encoding: "bom|utf-8", headers: :first_row)

答案 2 :(得分:12)

我刚遇到这样的问题,发现CSV不喜欢col-sep和引号字符之间的空格。 一旦我删除那些一切都很顺利。 所以我有:

12,  "N",  12, "Pacific/Majuro"

但是一旦我使用

格式化了空格
.gsub(/,\s+\"/,',\"')

导致

12,"N",  12,"Pacific/Majuro"
一切都很顺利。

答案 3 :(得分:1)

我的商标字符出现了问题。

商标字符转换为\“!在UTF-8中,所以它是开放式引用符号抛出错误。所以我这样做了:

.gsub!("\"!", "")

然后我尝试创建我的CSV对象,它工作正常。

答案 4 :(得分:1)

Rails 6版本,ruby 2.4 +

CSV.foreach(file, liberal_parsing: true, headers: :first_row) do |row|
    // do whatever
end

https://ruby-doc.org/stdlib-2.4.0/libdoc/csv/rdoc/CSV.html

答案 5 :(得分:0)

我试图读取文件并获取一个字符串,然后将其解析为CSV表,但收到异常:

CSV.read(File.read('file.csv'), headers: true)
CSV::MalformedCSVError: Unclosed quoted field on line 1794.

此处提供的答案均不适合我。实际上,获得最高票数的人花了很长时间来解析,最终我终止了执行。它很可能引发了许多异常,而在大文件上花费了很多时间。

更成问题的是,该错误不是很大,因为它是一个很大的CSV文件。 1794行到底在哪里?我在LibreOffice中打开了文件,打开后没有任何问题。 1794行是csv文件的最后一行数据。因此,显然问题出在CSV文件的末尾。我决定使用File.read以字符串形式检查内容。我注意到该字符串以回车符结尾:

,\"\"\r

我决定使用chomp并删除文件末尾的回车符。请注意,如果未从默认的Ruby记录分隔符中更改$ /,那么chomp还将删除回车符(即它将删除\ n,\ r和\ r \ n)。

CSV.parse(File.read('file.csv' ).chomp, headers: true)
 => #<CSV::Table mode:col_or_row row_count:1794>

它奏效了。问题是文件末尾的\ r字符。

答案 6 :(得分:0)

来自this thread的选项:quote_char => "|"

CSV.read(filename, :quote_char => "|")

答案 7 :(得分:0)

:liberal_parsing => true中添加CSV.read参数,这应该可以解决“非法引号”的某些问题

答案 8 :(得分:-1)

这个错误的一个不太常见的原因是当文件没有做任何字段引用,但 quote_char 仍然设置(默认情况下它是 ")和一个或多个字段恰好包含该字符。

要完全禁用字段引用,请在解析选项中设置 quote_char: nil

例如,给定一个文件 /tmp/people.csv,如下所示:

Actor,Dwayne "The Rock" Johnson,1972-05-02
Character,TV's Frank,1956-08-30

可以这样解析:

CSV.read('/tmp/people.csv', quote_char: nil)

答案 9 :(得分:-3)

试试这个提示:

  1. 在文本编辑器中打开CSV文件
  2. 选择整个文件并将其复制
  3. 打开新文本文件
  4. 将CSV数据粘贴到新文件中并保存新文件
  5. 导入新的CSV文件