我可以在Python中恢复其闭包含循环的函数吗?

时间:2014-10-08 07:00:42

标签: python google-app-engine closures

我试图序列化Python函数(代码+闭包),稍后再恢复它们。我正在使用本文底部的代码。

这是非常灵活的代码。它允许内部函数的序列化和反序列化,以及闭包函数,例如需要恢复其上下文的函数:

def f1(arg):
    def f2():
        print arg

    def f3():
        print arg
        f2()

    return f3

x = SerialiseFunction(f1(stuff)) # a string
save(x) # save it somewhere

# later, possibly in a different process

x = load() # get it from somewhere 
newf2 = DeserialiseFunction(x)
newf2() # prints value of "stuff" twice

即使函数闭包中存在函数,闭包中的函数等等,这些调用也会起作用(我们有一个闭包图,其中闭包包含具有包含更多函数的闭包的函数,等等)

然而,事实证明这些图表可以包含周期:

def g1():
    def g2():
        g2()
    return g2()

g = g1()

如果我查看g2关闭(通过g),我可以在其中看到g2

>>> g
<function g2 at 0x952033c>
>>> g.func_closure[0].cell_contents
<function g2 at 0x952033c>

当我尝试反序列化函数时,这会导致严重的问题,因为一切都是不可变的。我需要做的是创建函数newg2

newg2 = types.FunctionType(g2code, globals, closure=newg2closure)

其中newg2closure的创建方式如下:

newg2closure = (make_cell(newg2),)

当然无法完成;每行代码都依赖于另一行。单元格是不可变的,元组是不可变的,函数类型是不可变的。

所以我想知道的是,有没有办法在上面创建newg2?有没有什么方法可以创建一个函数类型对象,在其自己的闭包图中提到该对象?

我正在使用python 2.7(我在App Engine上,所以我不能进入Python 3)。


供参考,我的序列化功能:

def SerialiseFunction(aFunction):
    if not aFunction or not isinstance(c, types.FunctionType):
        raise Exception ("First argument required, must be a function")

    def MarshalClosureValues(aClosure):
        logging.debug(repr(aClosure))
        lmarshalledClosureValues = []
        if aClosure:
            lclosureValues = [lcell.cell_contents for lcell in aClosure]
            lmarshalledClosureValues = [
                [marshal.dumps(litem.func_code), MarshalClosureValues(litem.func_closure)] if hasattr(litem, "func_code")
                else [marshal.dumps(litem)] 
                for litem in lclosureValues
            ]
        return lmarshalledClosureValues

    lmarshalledFunc = marshal.dumps(aFunction.func_code)
    lmarshalledClosureValues = MarshalClosureValues(aFunction.func_closure)
    lmoduleName = aFunction.__module__

    lcombined = (lmarshalledFunc, lmarshalledClosureValues, lmoduleName)

    retval = marshal.dumps(lcombined)

    return retval


def DeserialiseFunction(aSerialisedFunction):
    lmarshalledFunc, lmarshalledClosureValues, lmoduleName = marshal.loads(aSerialisedFunction)

    lglobals = sys.modules[lmoduleName].__dict__

    def make_cell(value):
        return (lambda x: lambda: x)(value).func_closure[0]

    def UnmarshalClosureValues(aMarshalledClosureValues):
        lclosure = None
        if aMarshalledClosureValues:
            lclosureValues = [
                    marshal.loads(item[0]) if len(item) == 1 
                    else types.FunctionType(marshal.loads(item[0]), lglobals, closure=UnmarshalClosureValues(item[1])) 
                    for item in aMarshalledClosureValues if len(item) >= 1 and len(item) <= 2
                ]
            lclosure = tuple([make_cell(lvalue) for lvalue in lclosureValues])
        return lclosure

    lfunctionCode = marshal.loads(lmarshalledFunc)
    lclosure = UnmarshalClosureValues(lmarshalledClosureValues)
    lfunction = types.FunctionType(lfunctionCode, lglobals, closure=lclosure)
    return lfunction

1 个答案:

答案 0 :(得分:3)

这是一种有效的方法。

您无法修复这些不可变对象,但您可以做的是使用代理函数代替循环引用,并让它们在全局字典中查找实际函数。

1:序列化时,跟踪您已经看过的所有功能。如果您再次看到同一个,请不要重新序列化,而是序列化一个标记值。

我使用过一套:

lfunctionHashes = set()

并且对于每个序列化项目,检查它是否在集合中,如果是,请使用哨兵,否则将其添加到集合并正确编组:

lhash = hash(litem)
if lhash in lfunctionHashes:
    lmarshalledClosureValues.append([lhash, None])
else:
    lfunctionHashes.add(lhash)
    lmarshalledClosureValues.append([lhash, marshal.dumps(litem.func_code), MarshalClosureValues(litem.func_closure, lfullIndex), litem.__module__])

2:反序列化时,保留functionhash:function

的全局词典
gfunctions = {}

在反序列化期间,每次对函数进行反序列化时,都要将其添加到gfunctions中。这里的item是(hash,code,closurevalues,modulename):

lfunction = types.FunctionType(marshal.loads(item[1]), globals, closure=UnmarshalClosureValues(item[2]))
gfunctions[item[0]] = lfunction

当你遇到一个函数的sentinel值时,使用代理,传入函数的散列:

lfunction = make_proxy(item[0])

这是代理人。它根据哈希查找实际函数:

def make_proxy(f_hash):
    def f_proxy(*args, **kwargs):
        global gfunctions
        f = lfunctions[f_hash]
        f(*args, **kwargs)

    return f_proxy

我还必须做出其他一些改变:

  • 我在某些地方使用泡菜而不是元帅,可能会进一步检查
  • 我在序列化中包含函数的模块名称以及代码和闭包,因此我可以在反序列化时查找函数的正确全局变量。
  • 在反序列化中,元组的长度告诉你你要反序列化的内容:1代表一个简单的值,2代表一个需要代理的函数,4代表一个完全序列化的函数

这是全新的代码。

lfunctions = {}

def DeserialiseFunction(aSerialisedFunction):
    lmarshalledFunc, lmarshalledClosureValues, lmoduleName = pickle.loads(aSerialisedFunction)

    lglobals = sys.modules[lmoduleName].__dict__
    lglobals["lfunctions"] = lfunctions

    def make_proxy(f_hash):
        def f_proxy(*args, **kwargs):
            global lfunctions
            f = lfunctions[f_hash]
            f(*args, **kwargs)

        return f_proxy

    def make_cell(value):
        return (lambda x: lambda: x)(value).func_closure[0]

    def UnmarshalClosureValues(aMarshalledClosureValues):
        global lfunctions

        lclosure = None
        if aMarshalledClosureValues:
            lclosureValues = []
            for item in aMarshalledClosureValues:
                ltype = len(item)
                if ltype == 1:
                    lclosureValues.append(pickle.loads(item[0]))
                elif ltype == 2:
                    lfunction = make_proxy(item[0])
                    lclosureValues.append(lfunction)
                elif ltype == 4:
                    lfuncglobals = sys.modules[item[3]].__dict__
                    lfuncglobals["lfunctions"] = lfunctions
                    lfunction = types.FunctionType(marshal.loads(item[1]), lfuncglobals, closure=UnmarshalClosureValues(item[2]))
                    lfunctions[item[0]] = lfunction
                    lclosureValues.append(lfunction)
            lclosure = tuple([make_cell(lvalue) for lvalue in lclosureValues])
        return lclosure

    lfunctionCode = marshal.loads(lmarshalledFunc)
    lclosure = UnmarshalClosureValues(lmarshalledClosureValues)
    lfunction = types.FunctionType(lfunctionCode, lglobals, closure=lclosure)
    return lfunction

def SerialiseFunction(aFunction):
    if not aFunction or not hasattr(aFunction, "func_code"):
        raise Exception ("First argument required, must be a function")

    lfunctionHashes = set()

    def MarshalClosureValues(aClosure, aParentIndices = []):
        lmarshalledClosureValues = []
        if aClosure:
            lclosureValues = [lcell.cell_contents for lcell in aClosure]

            lmarshalledClosureValues = []
            for index, litem in enumerate(lclosureValues):
                lfullIndex = list(aParentIndices)
                lfullIndex.append(index)

                if isinstance(litem, types.FunctionType):
                    lhash = hash(litem)
                    if lhash in lfunctionHashes:
                        lmarshalledClosureValues.append([lhash, None])
                    else:
                        lfunctionHashes.add(lhash)
                        lmarshalledClosureValues.append([lhash, marshal.dumps(litem.func_code), MarshalClosureValues(litem.func_closure, lfullIndex), litem.__module__])
                else:
                    lmarshalledClosureValues.append([pickle.dumps(litem)])

    lmarshalledFunc = marshal.dumps(aFunction.func_code)
    lmarshalledClosureValues = MarshalClosureValues(aFunction.func_closure)
    lmoduleName = aFunction.__module__

    lcombined = (lmarshalledFunc, lmarshalledClosureValues, lmoduleName)

    retval = pickle.dumps(lcombined)

    return retval