正则表达式捕获类和方法

时间:2014-12-09 08:30:15

标签: regex

如何从python文件中捕获类和方法?

我不关心attrs或args。

class MyClass_1(...):
    ...
    def method1_of_first_class(self):
        ...

    def method2_of_first_class(self):
        ...

    def method3_of_first_class(self):
        ...

class MyClass_2(...):
    ...
    def method1_of_second_class(self):
        ...

    def method2_of_second_class(self):
        ...

    def method3_of_second_class(self):
        ...

到目前为止我尝试了什么:

class ([\w_]+?)\(.*?\):.*?(?:def ([\w_]+?)\(self.*?\):.*?)+?

选项:dot匹配换行符

抓住课程

Match the characters “class ” literally «class »
Match the regular expression below and capture its match into backreference number 1 «([\w_]+?)»
   Match a single character present in the list below «[\w_]+?»
      Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
      A word character (letters, digits, etc.) «\w»
      The character “_” «_»
Match the character “(” literally «\(»
Match any single character «.*?»
   Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “)” literally «\)»
Match the character “:” literally «:»
Match any single character «.*?»
   Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»

掌握方法:

Match the regular expression below «(?:def ([\w_]+?)\(self.*?\):.*?)+?»
   Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
   Match the characters “def ” literally «def »
   Match the regular expression below and capture its match into backreference number 2 «([\w_]+?)»
      Match a single character present in the list below «[\w_]+?»
         Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
         A word character (letters, digits, etc.) «\w»
         The character “_” «_»
   Match the character “(” literally «\(»
   Match the characters “self” literally «self»
   Match any single character «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
   Match the character “)” literally «\)»
   Match the character “:” literally «:»
   Match any single character «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»

但是它只捕获了类名和第一种方法,我认为这是因为反向引号2不能捕获超过1,即使它在(?:myregex)+内部?

当前输出:

'MyClass_1':'method1_of_first_class',
'MyClass_2':'method1_of_second_class'

期望的输出:

'MyClass_1':['method1_of_first_class','method2_of_first_class',...],
'MyClass_2':['method1_of_second_class','method2_of_second_class',...]

2 个答案:

答案 0 :(得分:2)

由于一个类可以包含另一个类或另一个函数,并且一个函数可以包含另一个函数或另一个类,只需使用正则表达式获取类和函数声明将导致层次结构信息丢失。

特别是,Python安装中的pydoc.py(可从2.1版获得)是此类案例的主要示例。

在Python中解析Python代码很简单,因为Python在parser模块和(从2.6版本)ast模块中包含内置解析器

这是使用ast模块(版本2.6及更高版本)在Python中解析Python代码的示例代码:

from ast import *
import sys

fi = open(sys.argv[1])
source = fi.read()
fi.close()

parse_tree = parse(source)

class Node:
    def __init__(self, node, children):
        self.node = node;
        self.children = children

    def __repr__(self):
        return "{{{}: {}}}".format(self.node, self.children)

class ClassVisitor(NodeVisitor):
    def visit_ClassDef(self, node):
        # print(node, node.name)

        r = self.generic_visit(node)
        return Node(("class", node.name), r)

    def visit_FunctionDef(self, node):
        # print(node, node.name)

        r = self.generic_visit(node)
        return Node(("function", node.name), r)


    def generic_visit(self, node):
        """Called if no explicit visitor function exists for a node."""
        node_list = []

        def add_child(nl, children):
            if children is None:
                pass
                ''' Disable 2 lines below if you need more scoping information '''
            elif type(children) is list:
                nl += children
            else:
                nl.append(children)

        for field, value in iter_fields(node):
            if isinstance(value, list):
                for item in value:
                    if isinstance(item, AST):
                        add_child(node_list, self.visit(item))
            elif isinstance(value, AST):
                add_child(node_list, self.visit(value))

        return node_list if node_list else None

print(ClassVisitor().visit(parse_tree))

代码已经在Python 2.7和Python 3.2中进行了测试。

由于generic_visit的默认实现没有返回任何内容,我复制了generic_visit的源代码并修改了它以将返回值传递回调用者。

答案 1 :(得分:0)

您可以使用this regex开头:

/class\s(\w+)|def\s(\w+)/gm

这将匹配所有类和方法名称。要将其纳入您在评论中提到的结构中,您可能需要使用实现语言。

修改: here's a PHP implementation example

$output = array();

foreach ($match_array[0] as $key => $value) {
    if (substr($value, 0, 5) === 'class') {
        $output[$value] = array();
        $parent_key = $value;
        continue;
    }
    $output[$parent_key][] = $value;
}

// print_r($output);

foreach ($output as $parent => $values) {
    echo '[' . $parent . ', [' . implode(',', $values) . ']]' . PHP_EOL;
}

示例输出:

[class MyClass_1, [def method1_of_first_class,def method2_of_first_class,def method3_of_first_class]]
[class MyClass_2, [def method1_of_second_class,def method2_of_second_class,def method3_of_second_class]]