Question

我真的不知道怎么说这个。我正在创建一个程序来读取另一个名为code.py的py文件，它会找到所有 VALID 字典变量名并打印它们，这很容易吗？但是我试图运行的代码非常棘手，故意用例子来欺骗正则表达式。 code.py的测试代码是here，我当前的代码是：

  import re
    with open ("code.py", "r") as myfile:
        data=myfile.read()
        potato = re.findall(r' *(\w+)\W*{',data,re.M)
        for i in range(len(potato)):
          print(potato[i])

正则表达式不起作用100％，当在测试代码上使用时，它将打印不打算打印的变量，例如：

# z={} 
z="z={}"
print('your mother = {}')

测试文件的预期输出是 a0，a，b，c d，e等一直到z，然后它将是aa，ab，ac，ad等一直到aq

并且测试代码中任何真正标记为z的内容都不应该打印出来。我意识到正则表达式并不令人惊讶，但我必须使用正则表达式才能完成。

编辑：使用新的正则表达式（r＆＃39; ^ （\ w +）\ W {＆＃39;，data，re.M）输出在分配了变量的示例中失败在一行，如，

d={
   };e={
        };

Answer 1

l应该打印，但z不应该

potato = re.findall(r'^ *(\w+)\W*{',data,re.M)

这应该解决它。

编辑：

".*?(?<!\\)"|'.*?(?<!\\)'|\([^)(]*\)|#[^\n]*\n|[^\'\"\#(\w\n]*(\w+)[^\w]*?{

参见演示。

https://regex101.com/r/gP5iH5/6

Answer 2

尝试使用正则表达式解析Python文件通常会被欺骗。我建议采用以下方法。 dis库可用于从编译的源代码中反汇编字节代码。从中可以选出所有词典。

假设有一个名为code.py的Python源文件：

import code
source_module = code
source_py = "code.py"

import sys, dis, re
from contextlib import contextmanager
from StringIO import StringIO

@contextmanager
def captureStdOut(output):
    stdout = sys.stdout
    sys.stdout = output
    yield
    sys.stdout = stdout

with open(source_py) as f_source:
    source_code = f_source.read()
    byte_code = compile(source_code, source_py, "exec")
    output = StringIO()

with captureStdOut(output):
    dis.dis(byte_code)
    dis.dis(source_module)

disassembly = output.getvalue()
dictionaries = re.findall("(?:BUILD_MAP|STORE_MAP).*?(?:STORE_FAST|STORE_NAME).*?\((.*?)\)", disassembly, re.M+re.S)

print dictionaries

当dis打印到stdout时，您需要重定向输出。然后可以使用正则表达式来查找所有条目。我做了两次，一次是通过编译源来获取全局变量，一次是通过导入模块来获取函数。可能有更好的方法来做到这一点，但似乎有效。

Python Regex从另一个文件中读取字典变量名

2 个答案: