Question

我需要解析用PHP编写的方法的注释。我写了一个正则表达式（参见下面的简化示例）来搜索它们，但它没有按预期工作。它不是匹配/**和*/之间文本的最短部分，而是匹配源代码的最大数量（以前的注释方法）。我确定我使用了正确的.*?非贪婪的*版本，但我没有发现任何证据DOTALL将其关闭。请问哪里可能是问题？谢谢。

p = re.compile(r'(?:/\*\*.*?\*/)\n\s*public', re.DOTALL)
methods = p.findall(text)

Answer 1

我认为你正试图解决这个问题，

>>> text = """ /** * comment */ class MyClass extens Base { /** * comment */ public function xyz """
>>> m = re.findall(r'\/\*\*(?:(?!\*\/).)*\*\/\s*public', text, re.DOTALL)
>>> m
['/** * comment */ public']

如果您不想在最后一场比赛中public，请使用以下使用正向前瞻的正则表达式，

>>> m = re.findall(r'\/\*\*(?:(?!\*\/).)*\*\/(?=\s*public)', text, re.DOTALL)
>>> m
['/** * comment */']

Answer 2

正则表达式引擎从左到右解析。延迟量词将尝试匹配当前匹配位置的最小值，但它不能推动匹配开始，即使这会减少匹配的文本量。这意味着，不是从/**之前的public开始，而是从第一个/**到下一个*/进行匹配。附加到public。

如果您想从评论中排除*/，则需要将.与先行断言进行分组：

(?:(?!\*/).)

(?!\*/)断言我们匹配的字符不是*/序列的开头。

Answer 3

你应该能够使用它：

\/\*\*([^*]|\*[^/])*?\*\/\s*public

这将匹配任何不是星号（*）的符号，如果是星号，则不允许其后跟正斜杠。这意味着它应该只捕获在公开之前关闭的评论，而不是更快。

示例：http://regexr.com/398b3

说明：http://tinyurl.com/lcewdmo

免责声明：如果评论中包含*/，则无法使用。

Answer 4

# Some examples and assuming that the annotation you want to parse
# starts with a /** and ends with a */.  This may be spread over
# several lines.

text = """
/**
 @Title(value='Welcome', lang='en')
 @Title(value='Wilkommen', lang='de')
 @Title(value='Vitajte', lang='sk')
 @Snippet
    ,*/
class WelcomeScreen {}

   /** @Target("method") */
  class Route extends Annotation {}

/** @Mapping(inheritance = @SingleTableInheritance,
    columns = {@ColumnMapping('id'), @ColumnMapping('name')}) */
public Person {}

"""

text2 = """ /** * comment */
CLASS MyClass extens Base {

/** * comment */
public function xyz
"""


import re

# Match a PHP annotation and the word following class or public
# function.
annotations = re.findall(r"""/\*\*             # Starting annotation
                                               # 
                            (?P<annote>.*?)    # Namned, non-greedy match
                                               # including newline
                                               #
                             \*/               # Ending annotation
                                               #
                             (?:.*?)           # Non-capturing non-greedy
                                               # including newline
                 (?:public[ ]+function|class)  # Match either
                                               # of these
                             [ ]+              # One or more spaces
                             (?P<name>\w+)     # Match a word
                         """,
                         text + text2,
                         re.VERBOSE | re.DOTALL | re.IGNORECASE)

for txt in annotations:
     print("Annotation: "," ".join(txt[0].split()))
     print("Name: ", txt[1])

Python中非贪婪的dotall正则表达式

4 个答案: