如何在CSV中查找和替换元素

时间:2014-09-26 16:17:57

标签: ruby parsing csv

如何在以下CSV文件示例中找到并替换模式,例如"West"周围的引号?

"LastName","FirstName","","","890","","6G","","S "West" AVENUE","","CITY","ZIP"

1 个答案:

答案 0 :(得分:3)

您无法使用the CSV class来阅读此内容,因为它是格式错误的CSV字符串。有时会发生这种情况,因为生成它的人并不知道他们在做什么:

require 'csv'
foo = '"LastName","FirstName","","","890","","6G","","S "West" AVENUE","","CITY","ZIP"'
arr_of_arrs = CSV.parse(foo)

然后导致异常:

Missing or stray quote in line 1 (CSV::MalformedCSVError)

相反,为了解决这个问题,你必须修复数据,然后解析。这是一个起点:

/(?<=\s)("[^"]+")(?=\s)/

http://rubular.com/r/sWEkx07Zyo

模式在引号之间寻找,用前导和尾随空格包装。这些空间没有被捕获。

以下是针对此特定示例的一些代码:

foo = '"LastName","FirstName","","","890","","6G","","S "West" AVENUE","","CITY","ZIP"'

REGEX = /(?<=\s)("[^"]+")(?=\s)/

word = foo[REGEX]
foo[REGEX] = word[1..-2]
puts foo
# >> "LastName","FirstName","","","890","","6G","","S West AVENUE","","CITY","ZIP"

此时可以使用CSV:

require 'csv'
arr_of_arrs = CSV.parse(foo)
# => [["LastName",
#      "FirstName",
#      "",
#      "",
#      "890",
#      "",
#      "6G",
#      "",
#      "S West AVENUE",
#      "",
#      "CITY",
#      "ZIP"]]

这些东西可能会令人困惑:

word = foo[REGEX]
foo[REGEX] = word[1..-2]

foo\[...\] is part of the String class,是一种查找和替换字符串中字符的简便方法。


虽然可以让CSV解析器对嵌入式引号感到满意,但是如果丢弃它们太过分了,你可以做类似的事情:

word = foo[REGEX]
foo[REGEX] = '"%s"' % word

require 'csv'
arr_of_arrs = CSV.parse(foo)
# => [["LastName",
#      "FirstName",
#      "",
#      "",
#      "890",
#      "",
#      "6G",
#      "",
#      "S \"West\" AVENUE",
#      "",
#      "CITY",
#      "ZIP"]]

它只是按照CSV规范的规则播放,并在字符串周围使用加倍的双引号。