Question

我正在尝试将一些导入行插入到python源文件中，但我最好将它们放在初始docstring之后。假设我将文件加载到行变量中，如下所示：

lines = open('filename.py').readlines()

如何找到文档字符串结束的行号？

Answer 1

您可以使用python的tokenize模块，而不是使用正则表达式，或依赖于特定的格式。

import tokenize
f=open(filename)
insert_index = None
for tok, text, (srow, scol), (erow,ecol), l in tokenize.generate_tokens(f.readline):
    if tok == tokenize.COMMENT:
        continue
    elif tok == tokenize.STRING:
        insert_index = erow, ecol
        break
    else:
        break # No docstring found

这样你甚至可以处理病态案例，如：

# Comment
# """Not the real docstring"""
' this is the module\'s \
docstring, containing:\
""" and having code on the same line following it:'; this_is_code=42

就像python一样处理它们。

Answer 2

如果您使用的是标准文档字符串格式，则可以执行以下操作：

count = 0
for line in lines:
    if line.startswith ('"""'):
        count += 1
        if count < 3:
            # Before or during end of the docstring
            continue
    # Line is after docstring

可能需要对没有文档字符串的文件进行一些调整，但如果文件格式一致，那么应该很容易。

Answer 3

这是一个基于Brian的精彩答案的函数，您可以使用它将文件拆分为docstring和code：

def split_docstring_and_code(infile):

    import tokenize
    insert_index = None
    f = open(infile)
    for tok, text, (srow, scol), (erow,ecol), l in tokenize.generate_tokens(f.readline):
        if tok == tokenize.COMMENT:
            continue
        elif tok == tokenize.STRING:
            insert_index = erow, ecol
            break
        else:
            break # No docstring found

    lines = open(infile).readlines()
    if insert_index is not None:
        erow = insert_index[0]
        return "".join(lines[:erow]), "".join(lines[erow:])
    else:
        return "", "".join(lines)

它假定结束docstring的行不包含超出字符串结束分隔符的其他代码。

如何使用正则表达式跳过docstring

3 个答案: