Question

我想在re.sub的帮助下修改字符串：

>>> re.sub("sparta", r"<b>\1</b>", "Here is Sparta.", flags=re.IGNORECASE)

我希望得到：

'Here is <b>Sparta</b>.'

但我得到了一个错误：

>>> re.sub("sparta", r"<b>\1</b>", "Here is Sparta.", flags=re.IGNORECASE)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/re.py", line 155, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "/usr/lib/python2.7/re.py", line 291, in filter
    return sre_parse.expand_template(template, match)
  File "/usr/lib/python2.7/sre_parse.py", line 833, in expand_template
    raise error, "invalid group reference"
sre_constants.error: invalid group reference

我应该如何使用re.sub来获得正确的结果？

Answer 1

您不在模式中指定任何捕获组，并在替换模式中对组1使用反向引用。这导致了一个问题。

在模式中定义捕获组并在替换模式中使用适当的反向引用，或使用\g<0>反向引用整个匹配：

re.sub("sparta", r"<b>\g<0></b>", "Here is Sparta.", flags=re.IGNORECASE)

请参阅Python demo。

Answer 2

当你在第二个字符串中使用\x时（我认为它被称为替换字符串）x是一个数字，python将用组x替换它。

您可以通过用括号括起来在正则表达式中定义一个组，如下所示：

re.sub(r"capture (['me]{2})", r'group 1: \1', 'capture me!') # => group 1: me
re.sub(r"capture (['me]{2})", r'group 1: \1', "capture 'em!") # => group 1: 'em

嵌套捕获？我已经失去了计数！

开场括号定义了它的编号：

(this is the first group (this is the second) (this is the third))

命名组

当您使用返回re.match或re.search的匹配对象时，命名组非常有用例如（有关更多信息，请参阅docs），以及使用复杂正则表达式时，因为它们带来清晰度。

您可以使用以下语法命名组：

(?P<your_group_name>your pattern)

所以，例如：

re.sub("(?P<first>hello(?P<second>[test]+)) (?P<third>[a-z])", "first: \g<first>") # => first: hello

什么是群组`0`

小组0是整场比赛。但是，您无法使用\0，因为它将打印出\x00（此转义代码的实际值）。解决方案是使用命名组语法（因为常规组是一种命名组：它们的名称只是一个整数）：\g<0>。所以，例如：

re.sub(r'[hello]+', r'\g<0>', 'lehleo') # => lehleo

为您的问题

这个答案只是为了解释捕捉，而不是真正回答你的问题，因为@Wiktor Stribiżew's one是完美的。

如何在Python 2.7中使用带有IGNORECASE的re.sub？

2 个答案:

命名组

什么是群组`0`

为您的问题

如何在Python 2.7中使用带有IGNORECASE的re.sub？

2 个答案:

命名组

什么是群组0

为您的问题

什么是群组`0`