使用Ruby代码中的“非法引用”错误修复无效的CSV行

时间:2012-05-15 05:29:10

标签: ruby csv

由于某种原因,csv文件中有一些行带有“非法引用”错误,例如:

1336481227,178.108.171.183,3.2.0,9700132ccc02e12a,c083b5d2-ec92-486f-a5b3-512dba1ce4ae,invoke_action,“{”“timestamp”“:”“2012-05-08 13:47:26”“} “

1336481227,178.108.171.183,3.2.0,9700132ccc02e12a,c083b5d2-ec92-486f-a5b3-512dba1ce4ae,invoke_action,{“”timestamp“”:“”“2012-05-08 13:47:27”“} < / p>

第一行是正确的。但是第二行中的最后一个字段{“”timestamp“”:“”2012-05-08 13:47:27“}}缺少大括号外的双引号,所以当我尝试

CSV.foreach(csv_file_path) do |row|
    puts "======================="
    puts row
    puts "======================="
end

我收到了错误

=======================
1336481227
178.108.171.183
3.2.0
9700132ccc02e12a
c083b5d2-ec92-486f-a5b3-512dba1ce4ae
invoke_action
{"timestamp":"2012-05-08 13:47:26","a":"b"}
=======================
#<CSV::MalformedCSVError: Illegal quoting in line 2.>

无论如何我可以用这样的问题解决这一行,或者只是在发生错误时跳过它?

编辑: 如果我试试

CSV.foreach(csv_file_path, :quote_char => "\'") do |row|
    puts "======================="
    puts row
    puts "======================="
end

虽然第一行的JSON格式值已被破坏:

=======================
1336481227
178.108.171.183
3.2.0
9700132ccc02e12a
c083b5d2-ec92-486f-a5b3-512dba1ce4ae
invoke_action
"{""timestamp"":""2012-05-08 13:47:26""
""a"":""b""}"
=======================
=======================
1336481227
178.108.171.183
3.2.0
9700132ccc02e12a
c083b5d2-ec92-486f-a5b3-512dba1ce4ae
invoke_action
{""timestamp"":""2012-05-08 13:47:27""}
=======================

2 个答案:

答案 0 :(得分:3)

尝试

CSV.foreach(csv_file_path, :quote_char => "\'")

答案 1 :(得分:0)

我认为最简单的方法是使用double gsub

require 'csv'
line = "1336481227,178.108.171.183,3.2.0,9700132ccc02e12a,c083b5d2-ec92-486f-a5b3-512dba1ce4ae,invoke_action,\"{\"\"timestamp\"\":\"\"2012-05-08 13:47:26\"\",\"\"a\"\":\"\"b\"\"}\""
line.gsub!('""', '%tmp%')
csv = CSV.new(line).each.map do |line|
  line.map do |value|
    value.gsub!('%tmp%', '""')
    value
  end
end

puts csv.inspect
# => [["1336481227", "178.108.171.183", "3.2.0", "9700132ccc02e12a", "c083b5d2-ec92-486f-a5b3-512dba1ce4ae", "invoke_action", "{\"\"timestamp\"\":\"\"2012-05-08 13:47:26\"\",\"\"a\"\":\"\"b\"\"}"]]