正则表达式拆分扩展CSV表示法

时间:2012-08-15 20:17:10

标签: regex csv data-transfer

我有一种自定义传输格式,可以按以下格式打包数据

[A:000, “姓名”, “字段”, “字段”, “字段”]

我正在尝试将各行分开,以获得左括号后的第一个字符和所有CSV值。      a,000,“name”,“field”,“field”等...

我拼凑了

[^?,:\[\]]

这会将所有单个字符拆分为冒号/逗号分隔字段。 我明白这不会在引号内容纳逗号。所以这显然是垃圾!

嵌入式逗号并不是一个很大的问题,因为我们控制着两端的数据,所以我可以逃脱它们。

感谢您的任何见解!

2 个答案:

答案 0 :(得分:2)

尝试匹配您想要匹配的任何内容,而不是尝试拆分多个字符并忽略其中的一些字符。由于您没有指定实现语言,我将其发布给Perl,但您可以将其应用于支持lookbehind和lookaheads的任何风格。

while ($subject =~ m/(\w+(?=:)|(?<=:)\d+|(?<=,")[^"]*?(?="))/g) {
    # matched text = $&
}

<强>解释

# (\w+(?=:)|(?<=:)\d+|(?<=,")[^"]*?(?="))
# 
# Match the regular expression below and capture its match into backreference number 1 «(\w+(?=:)|(?<=:)\d+|(?<=,")[^"]*?(?="))»
# Match either the regular expression below (attempting the next alternative only if this one fails) «\w+(?=:)»
# Match a single character that is a “word character” (letters, digits, and underscores) «\w+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=:)»
# Match the character “:” literally «:»
# Or match regular expression number 2 below (attempting the next alternative only if this one fails) «(?<=:)\d+»
# Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=:)»
# Match the character “:” literally «:»
# Match a single digit 0..9 «\d+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Or match regular expression number 3 below (the entire group fails if this one fails to match) «(?<=,")[^"]*?(?=")»
# Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=,")»
# Match the characters “,"” literally «,"»
# Match any character that is NOT a “"” «[^"]*?»
# Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=")»
# Match the character “"” literally «"»

See it working

答案 1 :(得分:0)

当然,您可以通过正则表达式执行此操作,但正确的工具很可能是CSV解析器。您可以通过Dave DeLong尝试这个目标C:

https://github.com/davedelong/CHCSVParser