Question

将re.IGNORECASE添加到我的正则表达式会导致某些匹配失败。这就是我的尝试：

print re.sub(r'[^a-z0-9 ]', '~', 'this (is) some tandom. text+ and [some] symbols {+/\-}', re.IGNORECASE)
>>>'this ~is~ some tandom. text+ and [some] symbols {+/\\-}'

我们可以看到许多符号在上面没有被'〜'替换，但是当我尝试相同而没有re.IGNORECASE时，所有特殊字符都被替换为'〜'

print re.sub(r'[^a-zA-Z0-9 ]', '~', 'this (is) some tandom. text+ and [some] symbols {+/\-}')
>>> 'this ~is~ some tandom~ text~ and ~some~ symbols ~~~~~~'

有关于re.IGNORECASE的遗漏吗？它不是只匹配大写和小写字母，而其余（数字，特殊字符等）保持不变？（如果可能有任何帮助，我正在使用Anaconda的python 2.7）

Answer 1

您错放了标记值，请使用

print re.sub(r'[^a-z0-9 ]', '~', 'this (is) some tandom. text+ and [some] symbols {+/\-}', flags=re.IGNORECASE)
# or
print re.sub(r'[^a-z0-9 ]', '~', 'this (is) some tandom. text+ and [some] symbols {+/\-}', 0, re.IGNORECASE)

IDEONE demo

请参阅re.sub docs：

re.sub(pattern, repl, string, count=0, flags=0) 可选参数count是要替换的模式发生的最大数量; count必须是非负整数。

您使用标志而不是计数。当您通过re.IGNORECASE时，count变为非负数，并且只替换了一些字符，而不是所有字符。

在Python 2.7中re.IGNORECASE意外行为

1 个答案: