使用grep或awk从文件串中提取文件串

时间:2017-10-27 09:59:07

标签: python regex awk grep

我想使用grep或awk从我的python文件中提取所有文档字符串。 我试过了

cat test.py | grep """[\w\W]*?"""

但我看不到输出。 假设测试test.py看起来像这样。

import libraries

class MyClass(object):
    """Docstring to this class. 
       second line of docstring."""

    def myClassMethod(a,b):
        """Docstring of the method. 
           another line in docstring of the method."""
        return a + b

然后输出应该是用三引号括起来的所有内容。

"""Docstring to this class. 
second line of docstring."""
"""Docstring of the method. 
another line in docstring of the method."""

2 个答案:

答案 0 :(得分:1)

从Python代码中提取文档字符串的正确方法是通过实际的Python解析器(ast模块):

#!/usr/bin/env python
import ast

with open('/path/to/file') as f:
    code = ast.parse(f.read())

for node in ast.walk(code):
    if isinstance(node, (ast.FunctionDef, ast.ClassDef, ast.Module)):
        docstring = ast.get_docstring(node)
        if docstring:
            print(repr(docstring))

运行您的样本将输出:

'Docstring to this class. \nsecond line of docstring.'
'Docstring of the method. \nanother line in docstring of the method.'

为了好玩,我们也可以使用GNU awk

$ awk -v RS= -v FPAT="'''.*'''|"'""".*"""' '{print $1}' file
"""Docstring to this class. 
       second line of docstring."""
"""Docstring of the method. 
           another line in docstring of the method."""

答案 1 :(得分:0)

使用P(perl)grep,您可以执行以下操作:

grep -Poz '"""[^"]+"""' test.py

输出:

"""Docstring to this class. 
       second line of docstring.""""""Docstring of the method. 
           another line in docstring of the method."""