我使用的是Ruby 2.4。如何解析包含引号字符的制表符分隔的行?这就是我现在发生的事情......
2.4.0 :003 > line = "11\tDave\tO\"malley"
=> "11\tDave\tO\"malley"
2.4.0 :004 > CSV.parse(line, col_sep: "\t")
CSV::MalformedCSVError: Illegal quoting in line 1.
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1912:in `block (2 levels) in shift'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1868:in `each'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1868:in `block in shift'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1828:in `loop'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1828:in `shift'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1770:in `each'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1784:in `to_a'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1784:in `read'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1324:in `parse'
from (irb):4
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands/console.rb:65:in `start'
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands/console_helper.rb:9:in `start'
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands/commands_tasks.rb:78:in `console'
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands/commands_tasks.rb:49:in `run_command!'
from /Users/davea/.rvm/gems/ruby-2.4.0@global/gems/railties-5.0.1/lib/rails/commands.rb:18:in `<top (required)>'
from bin/rails:4:in `require'
from bin/rails:4:in `<main>'
虽然这个例子说明了我的观点,但我无法轻易控制输入的内容。所以,虽然答案可以说是&lt; &#34;在解析之前删除teh字符串中的所有引号,&#34;我想尽可能保留数据。
答案 0 :(得分:1)
如果您尝试遵守CSV标准,那么这是一个格式错误的文档。 Instad你可能只是暴力强迫它并祈祷数据本身没有标签:
line.split(/\t/)
当您处理这样的数据时,CSV解析库会派上用场:
"1\t2\t\"3a\t3b\"\t4"
更新:如果您准备滥用CSV库,那么您可以这样做:
CSV.parse("11\tDave\tO\"malley", col_sep: "\t", quote_char: "\0")
这基本上会导致报价检测,所以如果有其他数据依赖于正确处理的数据,这可能无法解决。
答案 1 :(得分:0)
&#34; 11 \ tDave \到\&#34;马利&#34;是无效的CSV数据。奇怪的是,答案是使用两个双引号,并引用每个元素
2.3.1 :001 > require 'csv'
=> true
2.3.1 :002 > line = "\"11\"\t\"Dave\"\t\"O\"\"malley\""
=> "\"11\"\t\"Dave\"\t\"O\"\"malley\""
2.3.1 :003 > puts line # for clarity
"11" "Dave" "O""malley"
=> nil
2.3.1 :004 > CSV.parse(line, col_sep: "\t")
=> [["11", "Dave", "O\"malley"]]