我试图序列化Python函数(代码+闭包),稍后再恢复它们。我正在使用本文底部的代码。
这是非常灵活的代码。它允许内部函数的序列化和反序列化,以及闭包函数,例如需要恢复其上下文的函数:
def f1(arg):
def f2():
print arg
def f3():
print arg
f2()
return f3
x = SerialiseFunction(f1(stuff)) # a string
save(x) # save it somewhere
# later, possibly in a different process
x = load() # get it from somewhere
newf2 = DeserialiseFunction(x)
newf2() # prints value of "stuff" twice
即使函数闭包中存在函数,闭包中的函数等等,这些调用也会起作用(我们有一个闭包图,其中闭包包含具有包含更多函数的闭包的函数,等等)
然而,事实证明这些图表可以包含周期:
def g1():
def g2():
g2()
return g2()
g = g1()
如果我查看g2
关闭(通过g
),我可以在其中看到g2
:
>>> g
<function g2 at 0x952033c>
>>> g.func_closure[0].cell_contents
<function g2 at 0x952033c>
当我尝试反序列化函数时,这会导致严重的问题,因为一切都是不可变的。我需要做的是创建函数newg2
:
newg2 = types.FunctionType(g2code, globals, closure=newg2closure)
其中newg2closure
的创建方式如下:
newg2closure = (make_cell(newg2),)
当然无法完成;每行代码都依赖于另一行。单元格是不可变的,元组是不可变的,函数类型是不可变的。
所以我想知道的是,有没有办法在上面创建newg2
?有没有什么方法可以创建一个函数类型对象,在其自己的闭包图中提到该对象?
我正在使用python 2.7(我在App Engine上,所以我不能进入Python 3)。
供参考,我的序列化功能:
def SerialiseFunction(aFunction):
if not aFunction or not isinstance(c, types.FunctionType):
raise Exception ("First argument required, must be a function")
def MarshalClosureValues(aClosure):
logging.debug(repr(aClosure))
lmarshalledClosureValues = []
if aClosure:
lclosureValues = [lcell.cell_contents for lcell in aClosure]
lmarshalledClosureValues = [
[marshal.dumps(litem.func_code), MarshalClosureValues(litem.func_closure)] if hasattr(litem, "func_code")
else [marshal.dumps(litem)]
for litem in lclosureValues
]
return lmarshalledClosureValues
lmarshalledFunc = marshal.dumps(aFunction.func_code)
lmarshalledClosureValues = MarshalClosureValues(aFunction.func_closure)
lmoduleName = aFunction.__module__
lcombined = (lmarshalledFunc, lmarshalledClosureValues, lmoduleName)
retval = marshal.dumps(lcombined)
return retval
def DeserialiseFunction(aSerialisedFunction):
lmarshalledFunc, lmarshalledClosureValues, lmoduleName = marshal.loads(aSerialisedFunction)
lglobals = sys.modules[lmoduleName].__dict__
def make_cell(value):
return (lambda x: lambda: x)(value).func_closure[0]
def UnmarshalClosureValues(aMarshalledClosureValues):
lclosure = None
if aMarshalledClosureValues:
lclosureValues = [
marshal.loads(item[0]) if len(item) == 1
else types.FunctionType(marshal.loads(item[0]), lglobals, closure=UnmarshalClosureValues(item[1]))
for item in aMarshalledClosureValues if len(item) >= 1 and len(item) <= 2
]
lclosure = tuple([make_cell(lvalue) for lvalue in lclosureValues])
return lclosure
lfunctionCode = marshal.loads(lmarshalledFunc)
lclosure = UnmarshalClosureValues(lmarshalledClosureValues)
lfunction = types.FunctionType(lfunctionCode, lglobals, closure=lclosure)
return lfunction
答案 0 :(得分:3)
这是一种有效的方法。
您无法修复这些不可变对象,但您可以做的是使用代理函数代替循环引用,并让它们在全局字典中查找实际函数。
1:序列化时,跟踪您已经看过的所有功能。如果您再次看到同一个,请不要重新序列化,而是序列化一个标记值。
我使用过一套:
lfunctionHashes = set()
并且对于每个序列化项目,检查它是否在集合中,如果是,请使用哨兵,否则将其添加到集合并正确编组:
lhash = hash(litem)
if lhash in lfunctionHashes:
lmarshalledClosureValues.append([lhash, None])
else:
lfunctionHashes.add(lhash)
lmarshalledClosureValues.append([lhash, marshal.dumps(litem.func_code), MarshalClosureValues(litem.func_closure, lfullIndex), litem.__module__])
2:反序列化时,保留functionhash:function
的全局词典gfunctions = {}
在反序列化期间,每次对函数进行反序列化时,都要将其添加到gfunctions中。这里的item是(hash,code,closurevalues,modulename):
lfunction = types.FunctionType(marshal.loads(item[1]), globals, closure=UnmarshalClosureValues(item[2]))
gfunctions[item[0]] = lfunction
当你遇到一个函数的sentinel值时,使用代理,传入函数的散列:
lfunction = make_proxy(item[0])
这是代理人。它根据哈希查找实际函数:
def make_proxy(f_hash):
def f_proxy(*args, **kwargs):
global gfunctions
f = lfunctions[f_hash]
f(*args, **kwargs)
return f_proxy
我还必须做出其他一些改变:
这是全新的代码。
lfunctions = {}
def DeserialiseFunction(aSerialisedFunction):
lmarshalledFunc, lmarshalledClosureValues, lmoduleName = pickle.loads(aSerialisedFunction)
lglobals = sys.modules[lmoduleName].__dict__
lglobals["lfunctions"] = lfunctions
def make_proxy(f_hash):
def f_proxy(*args, **kwargs):
global lfunctions
f = lfunctions[f_hash]
f(*args, **kwargs)
return f_proxy
def make_cell(value):
return (lambda x: lambda: x)(value).func_closure[0]
def UnmarshalClosureValues(aMarshalledClosureValues):
global lfunctions
lclosure = None
if aMarshalledClosureValues:
lclosureValues = []
for item in aMarshalledClosureValues:
ltype = len(item)
if ltype == 1:
lclosureValues.append(pickle.loads(item[0]))
elif ltype == 2:
lfunction = make_proxy(item[0])
lclosureValues.append(lfunction)
elif ltype == 4:
lfuncglobals = sys.modules[item[3]].__dict__
lfuncglobals["lfunctions"] = lfunctions
lfunction = types.FunctionType(marshal.loads(item[1]), lfuncglobals, closure=UnmarshalClosureValues(item[2]))
lfunctions[item[0]] = lfunction
lclosureValues.append(lfunction)
lclosure = tuple([make_cell(lvalue) for lvalue in lclosureValues])
return lclosure
lfunctionCode = marshal.loads(lmarshalledFunc)
lclosure = UnmarshalClosureValues(lmarshalledClosureValues)
lfunction = types.FunctionType(lfunctionCode, lglobals, closure=lclosure)
return lfunction
def SerialiseFunction(aFunction):
if not aFunction or not hasattr(aFunction, "func_code"):
raise Exception ("First argument required, must be a function")
lfunctionHashes = set()
def MarshalClosureValues(aClosure, aParentIndices = []):
lmarshalledClosureValues = []
if aClosure:
lclosureValues = [lcell.cell_contents for lcell in aClosure]
lmarshalledClosureValues = []
for index, litem in enumerate(lclosureValues):
lfullIndex = list(aParentIndices)
lfullIndex.append(index)
if isinstance(litem, types.FunctionType):
lhash = hash(litem)
if lhash in lfunctionHashes:
lmarshalledClosureValues.append([lhash, None])
else:
lfunctionHashes.add(lhash)
lmarshalledClosureValues.append([lhash, marshal.dumps(litem.func_code), MarshalClosureValues(litem.func_closure, lfullIndex), litem.__module__])
else:
lmarshalledClosureValues.append([pickle.dumps(litem)])
lmarshalledFunc = marshal.dumps(aFunction.func_code)
lmarshalledClosureValues = MarshalClosureValues(aFunction.func_closure)
lmoduleName = aFunction.__module__
lcombined = (lmarshalledFunc, lmarshalledClosureValues, lmoduleName)
retval = pickle.dumps(lcombined)
return retval