Question

我有一个sphinx格式的docstring，我想从中提取不同的部分（param，return，type，rtype等）以供进一步处理。我怎样才能做到这一点？

Answer 1

你可以使用docutils，这是Sphinx的基础。在this other answer中，我使用docutils.core.publish_doctree来获取reStructuredText文档的XML表示（实际上是一串文本），然后使用xml.minidom方法从该XML中提取字段列表。另一种方法是使用xml.etree.ElementTree，这在我看来更容易使用。

然而，首先，每当docutils遇到像

这样的reStructuredText块时

:param x: Some parameter

生成的XML表示形式（我知道，它非常详细）：

<field_list>
    <field>
        <field_name>
            param x
        </field_name>
        <field_body>
            <paragraph>
                Some parameter
            </paragraph>
        </field_body>
    </field>
</field_list>

以下代码将获取文档中的所有field_list元素，并将field/field_name和field/field_body/paragraph中的文本作为2元组放入列表中。然后，您可以按照您希望的方式操作。

from docutils.core import publish_doctree
import xml.etree.ElementTree as etree

source = """Some help text

:param x: some parameter
:type x: and it's type

:return: Some text
:rtype: Return type

Some trailing text. I have no idea if the above is valid Sphinx
documentation!
"""

doctree = publish_doctree(source).asdom()

# Convert to etree.ElementTree since this is easier to work with than
# xml.minidom
doctree = etree.fromstring(doctree.toxml())

# Get all field lists in the document.
field_lists = doctree.findall('field_list')

fields = [f for field_list in field_lists \
    for f in field_list.findall('field')]

field_names = [name.text for field in fields \
    for name in field.findall('field_name')]

field_text = [etree.tostring(element) for field in fields \
    for element in field.findall('field_body')]

print zip(field_names, field_text)

这会产生列表：

[('param x', '<field_body><paragraph>some parameter</paragraph></field_body>'),
 ('type x', "<field_body><paragraph>and it's type</paragraph></field_body>"), 
 ('return', '<field_body><paragraph>Some text</paragraph></field_body>'), 
 ('rtype', '<field_body><paragraph>Return type</paragraph></field_body>')]

因此，每个元组中的第一项是字段列表项（即:return:，:param x:等），第二项是相应的文本。显然这个文本并不是最干净的输出 - 但是上面的代码很容易修改，所以我把它留给OP来获得他们想要的确切输出。

像文档那样解析狮身人面像

1 个答案: