Python - 正则表达式 - 查找函数名称但不查找函数调用

时间:2013-04-23 14:34:11

标签: python regex

好的,所以我有一堆C和C ++代码,我需要过滤并找到功能保护。我不知道函数类型/返回值,我不知道函数defenition或函数调用中的参数数量等。

到目前为止,我有:

import re, sys
from os.path import abspath
from os import walk

function = 'msg'
regexp = r"(" + function + ".*[^;]){"

found = False
for root, folders, files in walk('C:\\codepath\\'):
    for filename in files:
        with open(abspath(root + '/' + filename)) as fh:
            data = fh.read()
            result = re.findall(regexp, data)
            if len(result) > 0:
                sys.stdout.write('\n Found function "' + config.function + '" in ' + filename + ':\n\t' + str(result))
                sys.stdout.flush()
    break
然而,这会产生一些不必要的结果。 正则表达式必须是故障,例如这些组合:

找出“msg”defenition而不是“msg()”调用所有突变的说法:

void
shapex_msg (struct shaper *s)
{
  msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
       s->bytes_per_second);
}

void shapex_msg (struct shaper *s)
{
  msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
       s->bytes_per_second);
}

void shapex_msg (struct shaper *s) {
  msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
       s->bytes_per_second);
}

1 个答案:

答案 0 :(得分:1)

可能类似于以下正则表达式:

def make_regex(name):
    return re.compile(r'\s*%s\s*\([^;)]*\)\s*\{' % re.escape(name))

测试您的示例:

>>> text = '''
... void
... shapex_msg (struct shaper *s)
... {
...   msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
...        s->bytes_per_second);
... }
... 
... void shapex_msg (struct shaper *s)
... {
...   msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
...        s->bytes_per_second);
... }
... 
... void shapex_msg (struct shaper *s) {
...   msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
...        s->bytes_per_second);
... }'''
>>> shapex_msg = make_regex_for_function('shapex_msg')
>>> shapex_msg.findall(text)
['\nshapex_msg (struct shaper *s)\n{', ' shapex_msg (struct shaper *s)\n{', ' shapex_msg (struct shaper *s) {']

它仅适用于多行定义:

>>> shapex_msg.findall('''int
        shapex_msg      (
int a,
int b
)  

        {'''
['\n   \tshapex_msg   \t(\nint a,\nint b\n)  \n\n\t{']

同时,通过函数调用:

>>> shapex_msg.findall('shapex_msg(1,2,3);')
[]

就像一张纸条一样,你的正则表达式不起作用,因为.*是贪婪的,因此它与正确的右括号不匹配。