Question

我正在使用Python解析Java源代码。我需要从源中提取注释文本。我尝试了以下内容。

拿1：

cmts = re.findall(r'/\*\*(.|[\r\n])*?\*/', lines)

返回：空白[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']

取2 :(在正则表达式周围添加分组括号）

cmts = re.findall(r'(/\*\*(.|[\r\n])*?\*/)', lines)

返回

单行注释（仅限示例）：

('/**\n\n * initialise the tag with the colors and the tag name\n\n */', ' ')

多行注释（仅限示例）：

('/**\n\n * Get the color related to a specified tag\n\n * @param tag the tag that we want to get the colour for\n\n * @return color of the tag in String\n\n */', ' ')

我只对initialise the tag with the colors and the tag name感兴趣或Get the color related to a specified tag, @param tag the tag that we want to get the colour for, @return color of the tag in String 我无法理解它。请给我一些指示！

Answer 1

要提取评论（/**和*/之间的所有内容），您可以使用：

re.findall(r'\*\*(.*?)\*\/', text, re.S)

（注意如果使用re.S / re.DOTALL，当点匹配换行符时，如何简化捕获组。）

然后，对于每个匹配，您可以删除多个空格/ *，并将\n替换为,：

def comments(text):
    for comment in re.findall(r'\*\*(.*?)\*\/', text, re.S):
        yield re.sub('\n+', ',', re.sub(r'[ *]+', ' ', comment).strip())

例如：

>>> list(comments('/**\n\n     * Get the color related to a specified tag\n\n     * @param tag the tag that we want to get the colour for\n\n     * @return color of the tag in String\n\n     */'))
['Get the color related to a specified tag, @param tag the tag that we want to get the colour for, @return color of the tag in String']

用于解压缩java注释的python正则表达式

1 个答案: