Question

我正在使用python正则表达式模块re。

我需要在这两个短语中匹配'（''）'内的任何内容，但“不那么贪心”。像这样：

show the (name) of the (person)

calc the sqrt of (+ (* (2 4) 3))

结果应该从短语1返回：

name
person

结果应从短语2返回：

+ (* (2 4) 3)

问题在于，为了适应第一个短语，我使用了'\(.*?\)'

对于第二个短语，这只适合+ (* (2 4)

使用'\(.*\)'正确拟合第二个词组，在第一个短语适合(name) of the (person)

正则表达式对这两个短语的正确作用是什么？

Answer 1

Pyparsing可以很容易地为这样的东西编写简单的一次性解析器：

>>> text = """show the (name) of the (person)
...
... calc the sqrt of (+ (* (2 4) 3))"""
>>> import pyparsing
>>> for match in pyparsing.nestedExpr('(',')').searchString(text):
...   print match[0]
...
['name']
['person']
['+', ['*', ['2', '4'], '3']]

请注意，嵌套的parens已被丢弃，嵌套的文本作为嵌套结构返回。

如果您想要每个括号位的原始文本，请使用originalTextFor修饰符：

>>> for match in pyparsing.originalTextFor(pyparsing.nestedExpr('(',')')).searchString(text):
...   print match[0]
...
(name)
(person)
(+ (* (2 4) 3))

Answer 2

你想要做的事情看起来像一个调车场（实际上它看起来像LISP，所以也许你应该检查PyLisp）。无需使用正则表达式来解析这些表达式。

请参阅Shunting yard文章@维基百科，它是Python implementation。

Answer 3

这符合所有必需的信息：

(?:\()(.*?\){2})|(?:\()(.*?)(?:\))

第1组= +（*（2 4）3）

可以使用.strip（'）'）

第2组= 名称，人

Answer 4

只要括号不是嵌套的，就可以使用惰性正则表达式：

\(.*?\)

虽然理论上可以在正则表达式中解析有限数量的嵌套，但这很难并且不值得付出努力。使用自定义python函数更容易做到这一点。有关详细说明，请参阅this answer。

正则表达式：匹配括号贪婪和非贪婪

4 个答案: