我已经有一个效果很好的memoizer。它使用pickle转储来序列化输入并创建MD5哈希作为键。函数结果非常大,并存储为文件名为MD5哈希的pickle文件。当我一个接一个地调用两个memoized函数时,memoizer
将加载第一个函数的输出并将其传递给第二个函数。第二个函数将序列化它,创建MD5然后加载输出。这是一个非常简单的代码:
@memoize
def f(x):
...
return y
@memoize
def g(x):
...
return y
y1 = f(x1)
y2 = g(y1)
在评估y1
时从磁盘加载 f
,然后在评估g
时将其序列化。是否有可能以某种方式绕过此步骤并将y1
(即MD5哈希)的密钥传递给g
?如果g
已有此密钥,则会从磁盘加载y2
。如果它没有,那么它就会要求"用于评估y1
的完整g
。
修改
import cPickle as pickle
import inspect
import hashlib
class memoize(object):
def __init__(self, func):
self.func = func
def __call__(self, *args, **kwargs):
arg = inspect.getargspec(self.func).args
file_name = self._get_key(*args, **kwargs)
try:
f = open(file_name, "r")
out = pickle.load(f)
f.close()
except:
out = self.func(*args, **kwargs)
f = open(file_name, "wb")
pickle.dump(out, f, 2)
f.close()
return out
def _arg_hash(self, *args, **kwargs):
_str = pickle.dumps(args, 2) + pickle.dumps(kwargs, 2)
return hashlib.md5(_str).hexdigest()
def _src_hash(self):
_src = inspect.getsource(self.func)
return hashlib.md5(_src).hexdigest()
def _get_key(self, *args, **kwargs):
arg = self._arg_hash(*args, **kwargs)
src = self._src_hash()
return src + '_' + arg + '.pkl'
答案 0 :(得分:3)
我认为你可以自动完成,但我一般认为最好明确“懒惰”的评价。因此,我将介绍一种为memoized函数添加额外参数的方法:lazy
。但是不是文件,pickle和md5我会稍微简化一下助手:
# I use a dictionary as storage instead of files
storage = {}
# No md5, just hash
def calculate_md5(obj):
print('calculating md5 of', obj)
return hash(obj)
# create dictionary entry instead of pickling the data to a file
def create_file(md5, data):
print('creating file for md5', md5)
storage[md5] = data
# Load dictionary entry instead of unpickling a file
def load_file(md5):
print('loading file with md5 of', md5)
return storage[md5]
我使用自定义类作为中间对象:
class MemoizedObject(object):
def __init__(self, md5):
self.md5 = result_md5
def get_real_data(self):
print('load...')
return load_file(self.md5)
def __repr__(self):
return '{self.__class__.__name__}(md5={self.md5})'.format(self=self)
最后,我假设您的函数只有一个参数,显示已更改的Memoize
:
class Memoize(object):
def __init__(self, func):
self.func = func
# The md5 to md5 storage is needed to find the result file
# or result md5 for lazy evaluation.
self.md5_to_md5_storage = {}
def __call__(self, x, lazy=False):
# If the argument is a memoized object no need to
# calculcate the hash, we can just look it up.
if isinstance(x, MemoizedObject):
key = x.md5
else:
key = calculate_md5(x)
if lazy and key in self.md5_to_md5_storage:
# Check if the key is present in the md5 to md5 storage, otherwise
# we can't be lazy
return MemoizedObject(self.md5_to_md5_storage[key])
elif not lazy and key in self.md5_to_md5_storage:
# Not lazy but we know the result
result = load_file(self.md5_to_md5_storage[key])
else:
# Unknown argument
result = self.func(x)
result_md5 = calculate_md5(result)
create_file(result_md5, result)
self.md5_to_md5_storage[key] = result_md5
return result
现在,如果你调用你的函数并在正确的位置指定lazy,你可以避免加载(unpickling)你的文件:
@Memoize
def f(x):
return x+1
@Memoize
def g(x):
return x+2
正常(第一次)运行:
>>> x1 = 10
>>> y1 = f(x1)
calculating md5 of 10
calculating md5 of 11
creating file for md5 11
>>> y2 = g(y1)
calculating md5 of 11
calculating md5 of 13
creating file for md5 13
没有lazy
:
>>> x1 = 10
>>> y1 = f(x1)
calculating md5 of 10
loading file with md5 of 11
>>> y2 = g(y1)
calculating md5 of 11
loading file with md5 of 13
使用lazy=True
>>> x1 = 10
>>> y1 = f(x1, lazy=True)
calculating md5 of 10
>>> y2 = g(y1)
loading file with md5 of 13
最后一个选项仅计算第一个参数的“md5”并加载最终结果的文件。这应该是你想要的。