Question

在复杂应用程序的上下文中，我需要导入用户提供的“脚本”。理想情况下，脚本会有

def init():
    blah

def execute():
    more blah

def cleanup():
    yadda

所以我只是

import imp
fname, path, desc = imp.find_module(userscript)
foo = imp.load_module(userscript, fname, path, desc)
foo.init()

但是，众所周知，只要load_module运行，用户的脚本就会执行。这意味着，脚本可以是这样的：

def init():
    blah

yadda

在我yadda脚本时立即调用import部分。

我需要的是一种方法：

首先检查它是否具有init（），execute（）和cleanup（）
如果它们存在，一切都很好
如果它们不存在，请抱怨
不要运行任何其他代码，或者至少在我知道没有init（）

通常我会强制使用相同的旧if __name__ == '__main__'技巧，但我对用户提供的脚本几乎没有控制权，所以我正在寻找一个相对无痛的解决方案。我已经看过各种复杂的技巧，包括解析脚本，但没有什么比这更简单了。我很惊讶它不存在......或者我可能没有得到什么。

感谢。

Answer 1

我尝试使用ast模块：

import ast

# which syntax elements are allowed at module level?
whitelist = [
  # docstring
  lambda x: isinstance(x, ast.Expr) \
             and isinstance(x.value, ast.Str),
  # import
  lambda x: isinstance(x, ast.Import),
  # class
  lambda x: isinstance(x, ast.ClassDef),
  # function
  lambda x: isinstance(x, ast.FunctionDef),
]

def validate(source, required_functions):
  tree = ast.parse(source)

  functions = set()
  required_functions = set(required_functions)

  for item in tree.body:
    if isinstance(item, ast.FunctionDef):
      functions.add(item.name)
      continue

    if all(not checker(item) for checker in whitelist):
      return False

  # at least the required functions must be there
  return len(required_functions - functions) == 0


if __name__ == "__main__":
  required_funcs = [ "init", "execute", "cleanup" ]
  with open("/tmp/test.py", "rb") as f:
    print("yay!" if validate(f.read(), required_funcs) else "d'oh!")

Answer 2

这是AST方法的一种更简单（更天真）的替代方案：

import sys
from imp import find_module, new_module, PY_SOURCE


EXPECTED = ("init", "execute", "cleanup")

def import_script(name):
    fileobj, path, description = find_module(name)

    if description[2] != PY_SOURCE:
        raise ImportError("no source file found")

    code = compile(fileobj.read(), path, "exec")

    expected = list(EXPECTED)
    for const in code.co_consts:
        if isinstance(const, type(code)) and const.co_name in expected:
            expected.remove(const.co_name)
    if expected:
        raise ImportError("missing expected function: {}".format(expected))

    module = new_module(name)
    exec(code, module.__dict__)
    sys.modules[name] = module
    return module

请记住，这是一种非常直接的方法，可以绕过Python的导入机制。

Answer 3

我首先不需要一些函数，而是使用abc模块或zope.interface符合指定接口的类。这迫使模块的制造者提供你想要的功能。

其次，我不打算寻找模块级代码。如果他这样做，那就是模块制造者的问题。这是太多的工作，没有实际的好处。

如果您担心安全问题，无论如何都需要以某种方式对代码进行沙盒化。

Answer 4

不确定你是否会考虑这种优雅，但它在某种意义上有点智能，因为它识别出def init是令牌的时候而不只是一个棘手的多行字符串的一部分：

'''
def init does not define init...
'''

无法识别init何时以棘手的替代方式定义，例如

init = lambda ...

或

codestr='def  i'+'nit ...'
exec(codestr)

处理所有此类情况的唯一方法是运行代码（例如在沙箱中或通过导入）并检查结果。

import tokenize
import token
import io
import collections

userscript = '''\
def init():
    blah

"""
def execute():
    more blah
"""

yadda
'''

class Token(object):
    def __init__(self, tok):
        toknum, tokval, (srow, scol), (erow, ecol), line = tok
        self.toknum = toknum
        self.tokname = token.tok_name[toknum]
        self.tokval = tokval
        self.srow = srow
        self.scol = scol
        self.erow = erow
        self.ecol = ecol
        self.line = line    

class Validator(object):
    def __init__(self, codestr):
        self.codestr = codestr
        self.toks = collections.deque(maxlen = 2)
        self.names = set()
    def validate(self):
        tokens = tokenize.generate_tokens(io.StringIO(self.codestr).readline)
        self.toks.append(Token(next(tokens)))
        for tok in tokens:
            self.toks.append(Token(tok))            
            if (self.toks[0].tokname == 'NAME'     # First token is a name
                and self.toks[0].scol == 0         # First token starts at col 0
                and self.toks[0].tokval == 'def'   # First token is 'def'
                and self.toks[1].tokname == 'NAME' # Next token is a name
                ):
                self.names.add(self.toks[1].tokval)
        delta = set(['init', 'cleanup', 'execute']) - self.names
        if delta:
            raise ValueError('{n} not defined'.format(n = ' and '.join(delta)))

v = Validator(userscript)
v.validate()

产量

ValueError: execute and cleanup not defined

Answer 5

一个非常简单的解决方案可能是检查每行代码的第一个字符：唯一允许的应该是：

def init():
def execute():
def cleanup():
以4个空格开头的行
[可选]：以#

这是非常原始的，但它符合您的要求......

更新：经过一秒钟后，我意识到它毕竟不是那么容易。考虑一下这段代码：

def init():
    v = """abc
def
ghi"""
    print(v)

这意味着你需要一个更复杂的代码解析算法......所以忘掉我的解决方案......

Answer 6

1到3的解决方案（不是yadda部分）是使用您需要的所有方法分发“generic_class.py”。所以，

class Generic(object):

    def __init__(self):
        return

    def execute(self):
        return

    # etc

然后，您可以检查导入内容中是否存在“generic”。如果它不存在你可以忽略它，如果它存在，那么你就知道究竟是什么。除非从你预先定义的方法中调用，否则任何额外的东西都不会被调用。

导入python模块而不实际执行它

6 个答案: