好的,所以我有一堆C和C ++代码,我需要过滤并找到功能保护。我不知道函数类型/返回值,我不知道函数defenition或函数调用中的参数数量等。
到目前为止,我有:
import re, sys
from os.path import abspath
from os import walk
function = 'msg'
regexp = r"(" + function + ".*[^;]){"
found = False
for root, folders, files in walk('C:\\codepath\\'):
for filename in files:
with open(abspath(root + '/' + filename)) as fh:
data = fh.read()
result = re.findall(regexp, data)
if len(result) > 0:
sys.stdout.write('\n Found function "' + config.function + '" in ' + filename + ':\n\t' + str(result))
sys.stdout.flush()
break
然而,这会产生一些不必要的结果。
正则表达式必须是故障,例如这些组合:
找出“msg”defenition而不是“msg()”调用所有突变的说法:
void
shapex_msg (struct shaper *s)
{
msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
s->bytes_per_second);
}
或
void shapex_msg (struct shaper *s)
{
msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
s->bytes_per_second);
}
或
void shapex_msg (struct shaper *s) {
msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
s->bytes_per_second);
}
答案 0 :(得分:1)
可能类似于以下正则表达式:
def make_regex(name):
return re.compile(r'\s*%s\s*\([^;)]*\)\s*\{' % re.escape(name))
测试您的示例:
>>> text = '''
... void
... shapex_msg (struct shaper *s)
... {
... msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
... s->bytes_per_second);
... }
...
... void shapex_msg (struct shaper *s)
... {
... msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
... s->bytes_per_second);
... }
...
... void shapex_msg (struct shaper *s) {
... msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second",
... s->bytes_per_second);
... }'''
>>> shapex_msg = make_regex_for_function('shapex_msg')
>>> shapex_msg.findall(text)
['\nshapex_msg (struct shaper *s)\n{', ' shapex_msg (struct shaper *s)\n{', ' shapex_msg (struct shaper *s) {']
它仅适用于多行定义:
>>> shapex_msg.findall('''int
shapex_msg (
int a,
int b
)
{'''
['\n \tshapex_msg \t(\nint a,\nint b\n) \n\n\t{']
同时,通过函数调用:
>>> shapex_msg.findall('shapex_msg(1,2,3);')
[]
就像一张纸条一样,你的正则表达式不起作用,因为.*
是贪婪的,因此它与正确的右括号不匹配。