如何在以下CSV文件示例中找到并替换模式,例如"West"
周围的引号?
"LastName","FirstName","","","890","","6G","","S "West" AVENUE","","CITY","ZIP"
答案 0 :(得分:3)
您无法使用the CSV class来阅读此内容,因为它是格式错误的CSV字符串。有时会发生这种情况,因为生成它的人并不知道他们在做什么:
require 'csv'
foo = '"LastName","FirstName","","","890","","6G","","S "West" AVENUE","","CITY","ZIP"'
arr_of_arrs = CSV.parse(foo)
然后导致异常:
Missing or stray quote in line 1 (CSV::MalformedCSVError)
相反,为了解决这个问题,你必须修复数据,然后解析。这是一个起点:
/(?<=\s)("[^"]+")(?=\s)/
http://rubular.com/r/sWEkx07Zyo
模式在引号之间寻找,用前导和尾随空格包装。这些空间没有被捕获。
以下是针对此特定示例的一些代码:
foo = '"LastName","FirstName","","","890","","6G","","S "West" AVENUE","","CITY","ZIP"'
REGEX = /(?<=\s)("[^"]+")(?=\s)/
word = foo[REGEX]
foo[REGEX] = word[1..-2]
puts foo
# >> "LastName","FirstName","","","890","","6G","","S West AVENUE","","CITY","ZIP"
此时可以使用CSV:
require 'csv'
arr_of_arrs = CSV.parse(foo)
# => [["LastName",
# "FirstName",
# "",
# "",
# "890",
# "",
# "6G",
# "",
# "S West AVENUE",
# "",
# "CITY",
# "ZIP"]]
这些东西可能会令人困惑:
word = foo[REGEX]
foo[REGEX] = word[1..-2]
foo\[...\]
is part of the String class,是一种查找和替换字符串中字符的简便方法。
虽然可以让CSV解析器对嵌入式引号感到满意,但是如果丢弃它们太过分了,你可以做类似的事情:
word = foo[REGEX]
foo[REGEX] = '"%s"' % word
require 'csv'
arr_of_arrs = CSV.parse(foo)
# => [["LastName",
# "FirstName",
# "",
# "",
# "890",
# "",
# "6G",
# "",
# "S \"West\" AVENUE",
# "",
# "CITY",
# "ZIP"]]
它只是按照CSV规范的规则播放,并在字符串周围使用加倍的双引号。