Question

我正在使用Python 2.7，我对使用正则表达式以及如何在Python中使用它们非常熟悉。我想使用正则表达式用分号替换逗号分隔符。问题是用double qoutes包装的数据应该保留嵌入的逗号。这是一个例子：

在：

"3,14","1,000,000",hippo,"cat,dog,frog",plain text,"2,25"

后：

"3,14";"1,000,000";hippo;"cat,dog,frog";plain text;"2,25"

是否有一个正则表达式可以做到这一点？

Answer 1

# Python 2.7
import re

text = '''
  "3,14","1,000,000",hippo,"cat,dog,frog",plain text,"2,25"
'''.strip()

print "Before: " + text
print "After:  " + ";".join(re.findall(r'(?:"[^"]+"|[^,]+)', text))

这会产生以下输出：

Before: "3,14","1,000,000",hippo,"cat,dog,frog",plain text,"2,25"
After:  "3,14";"1,000,000";hippo;"cat,dog,frog";plain text;"2,25"

如果您需要更多自定义，则可以修改此here。

Answer 2

您可以使用：

>>> s = 'foo bar,"3,14","1,000,000",hippo,"cat,dog,frog",plain text,"2,25"'
>>> print re.sub(r'(?=(([^"]*"){2})*[^"]*$),', ';', s)
foo bar;"3,14";"1,000,000";hippo;"cat,dog,frog";plain text;"2,25"

RegEx Demo

只有在,之后匹配偶数引号时，才会匹配逗号。

Answer 3

这是另一种方法，可以避免在每次出现时都检查所有字符串，直到最后都有一个预测。它是re模块的一种（或多或少）\G特征模拟。这个模式不是测试逗号之后的内容，而是在逗号之前找到项目（显然是逗号），并且以一种使每个整个匹配连续到先例的方式编写。

re.sub(r'(?:(?<=,)|^)(?=("(?:"")*(?:[^"]+(?:"")*)*"|[^",]*))\1,', r'\1;', s)

online demo

细节：

(?:          # ensures that results are contiguous 
    (?<=,)        # preceded by a comma (so, the one of the last result)
  |             # OR
    ^             # at the start of the string
)
(?= # (?=(a+))\1 is a way to emulate an atomic group: (?>a+)
    (                        # capture the precedent item in group 1
        "(?:"")*(?:[^"]+(?:"")*)*"  # an item between quotes
      |
        [^",]*               # an item without quotes
    )
) \1  # back-reference for the capture group 1
,

这种方式的优点在于它减少了获得匹配的步骤数，并且无论之前的项目（参见regex101调试器），都提供了接近恒定步数的步骤。原因是所有字符只匹配/测试一次。因此，即使模式更长，它也更有效（并且增长尤其是长线的增长）

原子组技巧只是为了减少最后一个项目失败前的步骤数（后面没有逗号）。

请注意，该模式处理带有转义引号（两个连续引号）的引号之间的项目："abcd""efgh""ijkl","123""456""789",foo

Answer 4

您可以使用正则表达式进行拆分，然后加入它：

>>> ';'.join([i.strip(',') for i in re.split(r'(,?"[^"]*",?)?',s) if i])
'"3,14";"1,000,000";hippo;"cat,dog,frog";plain text;"2,25"'

Answer 5

这个正则表达式似乎可以完成这项工作

,(?=(?:[^"]*"[^"]*")*[^"]*\Z)

改编自： How to match something with regex that is not between two special characters?

使用http://pythex.org/

进行测试

这是否可以使用正则表达式

5 个答案:

RegEx Demo