在python中的链备忘录

时间:2017-01-23 17:08:09

标签: python memoization

我已经有一个效果很好的memoizer。它使用pickle转储来序列化输入并创建MD5哈希作为键。函数结果非常大,并存储为文件名为MD5哈希的pickle文件。当我一个接一个地调用两个memoized函数时,memoizer将加载第一个函数的输出并将其传递给第二个函数。第二个函数将序列化它,创建MD5然后加载输出。这是一个非常简单的代码:

@memoize
def f(x):
    ...
    return y

@memoize
def g(x):
    ...
    return y

y1 = f(x1)
y2 = g(y1)
在评估y1时从磁盘加载

f,然后在评估g时将其序列化。是否有可能以某种方式绕过此步骤并将y1(即MD5哈希)的密钥传递给g?如果g已有此密钥,则会从磁盘加载y2。如果它没有,那么它就会要求"用于评估y1的完整g

修改

import cPickle as pickle
import inspect
import hashlib

class memoize(object):
    def __init__(self, func):
        self.func = func

    def __call__(self, *args, **kwargs):
        arg = inspect.getargspec(self.func).args
        file_name = self._get_key(*args, **kwargs)
        try:
            f = open(file_name, "r")
            out = pickle.load(f)
            f.close()
        except:
            out = self.func(*args, **kwargs)
            f = open(file_name, "wb")
            pickle.dump(out, f, 2)
            f.close()

        return out

    def _arg_hash(self, *args, **kwargs):
        _str = pickle.dumps(args, 2) + pickle.dumps(kwargs, 2)
        return hashlib.md5(_str).hexdigest()

    def _src_hash(self):
        _src = inspect.getsource(self.func)
        return hashlib.md5(_src).hexdigest()

    def _get_key(self, *args, **kwargs):
        arg = self._arg_hash(*args, **kwargs)
        src = self._src_hash()
        return src + '_' + arg + '.pkl'

1 个答案:

答案 0 :(得分:3)

我认为你可以自动完成,但我一般认为最好明确“懒惰”的评价。因此,我将介绍一种为memoized函数添加额外参数的方法:lazy。但是不是文件,pickle和md5我会稍微简化一下助手:

# I use a dictionary as storage instead of files
storage = {}

# No md5, just hash
def calculate_md5(obj):
    print('calculating md5 of', obj)
    return hash(obj)

# create dictionary entry instead of pickling the data to a file
def create_file(md5, data):
    print('creating file for md5', md5)
    storage[md5] = data

# Load dictionary entry instead of unpickling a file
def load_file(md5):
    print('loading file with md5 of', md5)
    return storage[md5]

我使用自定义类作为中间对象:

class MemoizedObject(object):
    def __init__(self, md5):
        self.md5 = result_md5

    def get_real_data(self):
        print('load...')
        return load_file(self.md5)

    def __repr__(self):
        return '{self.__class__.__name__}(md5={self.md5})'.format(self=self)

最后,我假设您的函数只有一个参数,显示已更改的Memoize

class Memoize(object):
    def __init__(self, func):
        self.func = func
        # The md5 to md5 storage is needed to find the result file 
        # or result md5 for lazy evaluation.
        self.md5_to_md5_storage = {}

    def __call__(self, x, lazy=False):
        # If the argument is a memoized object no need to
        # calculcate the hash, we can just look it up.
        if isinstance(x, MemoizedObject):
            key = x.md5
        else:
            key = calculate_md5(x)

        if lazy and key in self.md5_to_md5_storage:
            # Check if the key is present in the md5 to md5 storage, otherwise
            # we can't be lazy
            return MemoizedObject(self.md5_to_md5_storage[key])
        elif not lazy and key in self.md5_to_md5_storage:
            # Not lazy but we know the result
            result = load_file(self.md5_to_md5_storage[key])
        else:
            # Unknown argument
            result = self.func(x)
            result_md5 = calculate_md5(result)
            create_file(result_md5, result)
            self.md5_to_md5_storage[key] = result_md5
        return result

现在,如果你调用你的函数并在正确的位置指定lazy,你可以避免加载(unpickling)你的文件:

@Memoize
def f(x):
    return x+1

@Memoize
def g(x):
    return x+2

正常(第一次)运行:

>>> x1 = 10
>>> y1 = f(x1)
calculating md5 of 10
calculating md5 of 11
creating file for md5 11
>>> y2 = g(y1)
calculating md5 of 11
calculating md5 of 13
creating file for md5 13

没有lazy

>>> x1 = 10
>>> y1 = f(x1)
calculating md5 of 10
loading file with md5 of 11
>>> y2 = g(y1)
calculating md5 of 11
loading file with md5 of 13

使用lazy=True

>>> x1 = 10
>>> y1 = f(x1, lazy=True)
calculating md5 of 10
>>> y2 = g(y1)
loading file with md5 of 13

最后一个选项仅计算第一个参数的“md5”并加载最终结果的文件。这应该是你想要的。