Question

所以这对python的正则表达式不起作用：

>>> re.sub('oof', 'bar\\', 'foooof')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\re.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "C:\Python27\lib\re.py", line 270, in _subx
    template = _compile_repl(template, pattern)
  File "C:\Python27\lib\re.py", line 257, in _compile_repl
    raise error, v # invalid expression
error: bogus escape (end of line)

我以为我的眼睛在欺骗我，所以我这样做了：

>>> re.sub('oof', "bar\x5c", 'foooof')

有同样的事情。我已经搜索并确认人们有这个问题。那么将repl视为普通字符串有什么问题呢？是否有其他格式化选项可以放在repl中？

Answer 1

是，替换字符串是针对转义字符处理的。来自the docs：

repl可以是字符串或函数;如果它是一个字符串，任何反斜杠其中的逃逸被处理。也就是说，\ n被转换为单个换行符，\ r \ n将转换为回车符，依此类推。诸如\ j之类的未知转义单独留下。反向引用，例如\ 6，被替换为模式中由组6匹配的子字符串。

Answer 2

使用原始字符串：

re.sub('oof', r'bar\\', 'foooof')

没有r前缀，您需要使用双重转义反斜杠：

re.sub('oof', 'bar\\\\', 'foooof')

Answer 3

如果您不希望处理字符串转义，则可以使用lambda并且不处理该字符串：

>>> re.sub('oof', lambda x: 'bar\\', 'foooof')
'foobar\\'
>>> s=re.sub('oof', lambda x: 'bar\\', 'foooof')
>>> print s
foobar\

但是在打印时它仍会被解释：

>>> re.sub('oof', lambda x: 'bar\r\\', 'foooof')
'foobar\r\\'
>>> print re.sub('oof', lambda x: 'bar\r\\', 'foooof')
\oobar

或者，使用原始字符串：

>>> re.sub('oof', r'bar\\', 'foooof')
'foobar\\'

Answer 4

您是否期望foobar\作为输出？如果是这样，re.sub('oof', r'bar\\', 'foooof')就是您所需要的; r告诉Python将后面的内容视为原始字符串，因此反斜杠被视为反斜杠，而不是作为以下字符需要特殊处理的标志。 Here是文档中的一个部分，可以更详细地解释这一点。

re.sub尝试转义repl字符串？

4 个答案: