Question

我可以使用

获得没有评论的AST

import ast
module = ast.parse(open('/path/to/module.py').read())

您能举例说明如何使用保留的注释（和空格）获取AST吗？

Answer 1

ast模块不包含评论。 tokenize模块可以为您提供注释，但不提供其他程序结构。

Answer 2

保存有关格式化，注释等信息的AST称为完整语法树。

redbaron能够做到这一点。使用pip install redbaron安装并尝试以下代码。

import redbaron

with open("/path/to/module.py", "r") as source_code:
    red = redbaron.RedBaron(source_code.read())

print (red.fst())

Answer 3

当编写任何类型的Python代码美化器，pep-8检查器等时，这个问题自然会出现。在这种情况下，你进行源到源的转换，你做期望输入由人类编写，不仅希望输出是人类可读的，而且还期望它：

包括所有评论，确切地说它们出现在原文中的位置。
输出字符串的确切拼写，包括原文中的文档字符串。

这与ast模块相比并不容易。你可以把它称为api中的一个洞，但似乎没有简单的方法来扩展api以轻松地做1和2。

Python to Coffeescript converter

从py2cs.py中的第1305行开始的TokenSync（ts）类协调基于令牌的数据与ast遍历之间的通信。给定源字符串s，TokenSync类标记s并且内部数据结构支持多种接口方法：

ts.leading_lines(node)：返回前面注释和空行的列表。

ts.trailing_comment(node)：返回包含节点尾随注释的字符串（如果有）。

ts.sync_string(node)：返回给定节点处字符串的拼写。

对于访客使用这些方法来说，这很简单，但有点笨拙。以下是py2cs.py中的CoffeeScriptTraverser（cst）类的一些示例：

def do_Str(self, node):
    '''A string constant, including docstrings.'''
    if hasattr(node, 'lineno'):
        return self.sync_string(node)

这项工作提供了ast.Str节点按它们在源中出现的顺序访问。这在大多数遍历中自然发生。

这是ast.If访客。它显示了如何使用ts.leading_lines和ts.trailing_comment：

def do_If(self, node):

    result = self.leading_lines(node)
    tail = self.trailing_comment(node)
    s = 'if %s:%s' % (self.visit(node.test), tail)
    result.append(self.indent(s))
    for z in node.body:
        self.level += 1
        result.append(self.visit(z))
        self.level -= 1
    if node.orelse:
        tail = self.tail_after_body(node.body, node.orelse, result)
        result.append(self.indent('else:' + tail))
        for z in node.orelse:
            self.level += 1
            result.append(self.visit(z))
            self.level -= 1
    return ''.join(result)

ts.tail_after_body方法补偿了没有代表＆＃39; else＆＃39;条款。它不是火箭科学，但它并不漂亮：

def tail_after_body(self, body, aList, result):
    '''
    Return the tail of the 'else' or 'finally' statement following the given body.
    aList is the node.orelse or node.finalbody list.
    '''
    node = self.last_node(body)
    if node:
        max_n = node.lineno
        leading = self.leading_lines(aList[0])
        if leading:
            result.extend(leading)
            max_n += len(leading)
        tail = self.trailing_comment_at_lineno(max_n + 1)
    else:
        tail = '\n'
    return tail

请注意，cst.tail_after_body只需拨打ts.tail_after_body。

<强>摘要

TokenSync类封装了将面向令牌的数据提供给ast遍历代码所涉及的大部分复杂性。使用TokenSync类很简单，但所有Python语句（和ast.Str）的访问者必须包含对ts.leading_lines，ts.trailing_comment和ts.sync_string的调用。此外，ts.tail_after_body hack需要处理＆＃34;缺失＆＃34; ast节点。

简而言之，代码运行良好，但有点笨拙。

@Andrei：你的简短回答可能暗示你知道一种更优雅的方式。如果是这样，我很乐意看到它。

Edward K. Ream

Answer 4

有些人已经提到 lib2to3，但我想创建一个更完整的答案，因为这个工具是一个不被重视的宝石。不要为redbaron而烦恼。

lib2to3由以下几部分组成：

解析器：令牌，语法等
修正：转化库
重构工具：将修复程序应用于已解析的数据
命令行：选择要应用的修补程序并使用多处理并行运行

以下是使用lib2to3进行转换和抓取数据（即提取）的简要介绍。

转换

如果你想转换python文件（即复杂的查找/替换），lib2to3提供的CLI功能齐全，可以并行转换文件。

要使用它，请创建一个python包，其中的每个子模块都包含一个lib2to3.fixer_base.BaseFix个子类。有关大量示例，请参阅lib2to3.fixes。

然后创建可执行脚本（替换＆＃34; myfixes＆＃34;以及包的名称）：

import sys
import lib2to3.main

def main(args=None):
    sys.exit(lib2to3.main.main("myfixes", args=args))

if __name__ == '__main__':
    main()

运行yourscript -h以查看选项。

刮

如果你的目标是收集数据，而不是转换数据，那么你需要做更多的工作。这是我用来lib2to3进行数据抓取的配方：

# file: basescraper.py
from __future__ import absolute_import, print_function

from lib2to3.pgen2 import token
from lib2to3.pgen2.parse import ParseError
from lib2to3.pygram import python_grammar
from lib2to3.refactor import RefactoringTool
from lib2to3 import fixer_base


def symbol_name(number):
    """
    Get a human-friendly name from a token or symbol

    Very handy for debugging.
    """
    try:
        return token.tok_name[number]
    except KeyError:
        return python_grammar.number2symbol[number]


class SimpleRefactoringTool(RefactoringTool):
    def __init__(self, scraper_classes, options=None, explicit=None):
        self.fixers = None
        self.scraper_classes = scraper_classes
        # first argument is a list of fixer paths, as strings. we override
        # get_fixers, so we don't need it.
        super(SimpleRefactoringTool, self).__init__(None, options, explicit)

    def get_fixers(self):
        """
        Override base method to get fixers from passed fixers classes instead
        of via dotted-module-paths.
        """
        self.fixers = [cls(self.options, self.fixer_log)
                       for cls in self.scraper_classes]
        return (self.fixers, [])

    def get_results(self):
        """
        Get the scraped results returned from `scraper_classes`
        """
        return {type(fixer): fixer.results for fixer in self.fixers}


class BaseScraper(fixer_base.BaseFix):
    """
    Base class for a fixer that stores results.

    lib2to3 was designed with transformation in mind, but if you just want
    to scrape results, you need a way to pass data back to the caller.
    """
    BM_compatible = True

    def __init__(self, options, log):
        self.results = []
        super(BaseScraper, self).__init__(options, log)

    def scrape(self, node, match):
        raise NotImplementedError

    def transform(self, node, match):
        result = self.scrape(node, match)
        if result is not None:
            self.results.append(result)


def scrape(code, scraper):
    """
    Simple interface when you have a single scraper class.
    """
    tool = SimpleRefactoringTool([scraper])
    tool.refactor_string(code, '<test.py>')
    return tool.get_results()[scraper]

这是一个简单的刮刀，它在函数def后找到第一个注释：

# file: commentscraper.py
from basescraper import scrape, BaseScraper, ParseError

class FindComments(BaseScraper):

    PATTERN = """ 
    funcdef< 'def' name=any parameters< '(' [any] ')' >
           ['->' any] ':' suite=any+ >
    """

    def scrape(self, node, results):
        suite = results["suite"]
        name = results["name"]

        if suite[0].children[1].type == token.INDENT:
            indent_node = suite[0].children[1]
            return (str(name), indent_node.prefix.strip())
        else:
            # e.g. "def foo(...): x = 5; y = 7"
            # nothing to save
            return

# example usage:

code = '''\

@decorator
def foobar():
    # type: comment goes here
    """
    docstring
    """
    pass

'''
comments = scrape(code, FindTypeComments)
assert comments == [('foobar', '# type: comment goes here')]

Answer 5

LibCST为Python提供了一个具体的语法树，外观和感觉都像AST。大多数节点类型都与AST相同，但可以使用格式信息（注释，空格，逗号等）。 https://github.com/Instagram/LibCST/

In [1]: import libcst as cst

In [2]: cst.parse_statement("fn(1, 2)  # a comment")                                                                                                                
Out[2]:
SimpleStatementLine(
    body=[
        Expr(
            value=Call(
                func=Name(
                    value='fn',
                    lpar=[],
                    rpar=[],
                ),
                args=[
                    Arg(
                        value=Integer(
                            value='1',
                            lpar=[],
                            rpar=[],
                        ),
                        keyword=None,
                        equal=MaybeSentinel.DEFAULT,
                        comma=Comma(        # <--- a comma
                            whitespace_before=SimpleWhitespace(
                                value='',
                            ),
                            whitespace_after=SimpleWhitespace(
                                value=' ',  # <--- a white space
                            ),
                        ),
                        star='',
                        whitespace_after_star=SimpleWhitespace(
                            value='',
                        ),
                        whitespace_after_arg=SimpleWhitespace(
                            value='',
                        ),
                    ),
                    Arg(
                        value=Integer(
                            value='2',
                            lpar=[],
                            rpar=[],
                        ),
                        keyword=None,
                        equal=MaybeSentinel.DEFAULT,
                        comma=MaybeSentinel.DEFAULT,
                        star='',
                        whitespace_after_star=SimpleWhitespace(
                            value='',
                        ),
                        whitespace_after_arg=SimpleWhitespace(
                            value='',
                        ),
                    ),
                ],
                lpar=[],
                rpar=[],
                whitespace_after_func=SimpleWhitespace(
                    value='',
                ),
                whitespace_before_args=SimpleWhitespace(
                    value='',
                ),
            ),
            semicolon=MaybeSentinel.DEFAULT,
        ),
    ],
    leading_lines=[],
    trailing_whitespace=TrailingWhitespace(
        whitespace=SimpleWhitespace(
            value='  ',
        ),
        comment=Comment(
            value='# a comment',  # <--- comment
        ),
        newline=Newline(
            value=None,
        ),
    ),
)

Answer 6

如果您使用的是python 3，则可以使用bowler，它基于lib2to3，但提供了更好的API和CLI来创建转换脚本。

https://pybowler.io/

Answer 7

其他专家似乎认为Python AST模块会删除注释，这意味着路由根本不适合您。

我们的DMS Software Reengineering Toolkit及其Python front end将解析Python并构建捕获所有注释的AST（see this SO example)。Python前端包含一个可以重新生成Python代码的prettyprinter（带有注释）！）直接来自AST.DMS本身提供了低级解析机制，以及对使用目标语言（例如Python）表面语法编写的模式进行操作的源到源转换功能。

带有保留注释的Python AST

7 个答案:

转换

刮