Question

我正在尝试删除所有括号和括号内的文字。我正在使用正则表达式

re.sub(r'\(.*\) | \[.*\]', '', text)

这适用于以下内容：

import re
text = 'the (quick) brown fox jumps over the [lazy] dog'
print re.sub(r'\(.*\) | \[.*\]', '', text)

> the brown fox jumps over the dog

text = '(the quick) brown fox jumps over the [lazy] dog'
print re.sub(r'\(.*\) | \[.*\]', '', text)

> brown fox jumps over the dog

但是当整个字符串与正则表达式匹配时它会失败

text = '[the quick brown fox jumps over the lazy dog]'
print re.sub(r'\(.*\) | \[.*\]', '', text)

> [the quick brown fox jumps over the lazy dog]

> # This should be '' (the empty string) #

我哪里错了？

Answer 1

你在正则表达式上有额外的空间，只需删除|之前和之后的空格

re.sub(r'\(.*\)|\[.*\]', '', text)

或使它们成为匹配现有输出的可选匹配

re.sub(r'\(.*\)\s?|\s?\[.*\]', '', text)

Answer 2

你有一个额外的空间，它试图匹配：）

尝试：

re.sub(r'\(.*\)|\[.*\]', '', text)

当正则表达式像这样奇怪的东西时，测试的好地方是here。这是一个很好的互动方式，可以看出出了什么问题。对于前者在你的情况下，它与“（步速）”不匹配，但在我放置空格后立即匹配“（步速）”。

注意：

正如我在评论中提到的那样，请注意，如果你的文本中有一个随机的“）”可能只是一个独立的符号，那么贪婪的匹配可能会发生意想不到的事情。请考虑不情愿的匹配：

re.sub(r'\(.*?\)|\[.*?\]', '', text)

会转：

This is a (small) sample text with a ) symbol" ===> "This is a sample text with a ) symbol"

而你现在会给出：

This is a (small) sample text with a ) symbol" ===> "This is a symbol"

Answer 3

import re
text = '''[the quick brown fox jumps over the lazy dog]
the (quick) brown fox jumps over the [lazy] dog
(the quick) brown fox jumps over the [lazy] dog'''
print (re.sub(r'[(\[].+?[)\]]', '', text))

出：

the  brown fox jumps over the  dog
 brown fox jumps over the  dog

Python正则表达式仅适用于子串匹配但不适用于整个字符串

3 个答案: