Question

我在python中使用正则表达式从文本文件中提取数据元素。我抓住了太多括号，遇到了一个问题。

文本存储在名为temp的字符串中，格式为：

temp='Somethingorother School District (additional text)|other stuff here'

我目前正在使用

match = re.search(r'(.* School District) (\(.*\))\|?',temp)

哪个效果很好并且匹配

match.group(1) = Somethingorother School District
match.group(2) = (additional text)

然而，有时其他的东西在这里＆＃39; part还包含括号，如下所示：

'Somethingorother School District (additional text)|$59900000 (4.7 mills)'

所以我得到了

match.group(2) = (additional text)|$59900000 (4.7 mills)

我明白这是因为*运算符是贪婪的，但是（附加文本）部分是相当特殊的，我想捕获这些括号中的任何内容。换句话说，我希望它在这些括号内变得贪婪，但是一旦匹配a）就停止查看。有没有办法做到这一点？

Answer 1

使用negated character class。

>>> match = re.search(r'(.* School District) (\([^()]*\))\|?',temp)
>>> match.group(1)
'Somethingorother School District'
>>> match.group(2)
'(additional text)'

[^()]*匹配任何字符，但不匹配(或)零次或多次。

DEMO

Answer 2

将非贪婪放在最后的括号上。

匹配太多括号的正则表达式

2 个答案: