关于`joblib`

Question

我正在寻找一种构建装饰器@memoize的方法，我可以在函数中使用它，如下所示：

@memoize
my_function(a, b, c):
    # Do stuff 
    # result may not always be the same for fixed (a,b,c)
return result

然后，如果我这样做：

result1 = my_function(a=1,b=2,c=3)
# The function f runs (slow). We cache the result for later

result2 = my_function(a=1, b=2, c=3)
# The decorator reads the cache and returns the result (fast)

现在说我想强制缓存更新：

result3 = my_function(a=1, b=2, c=3, force_update=True)
# The function runs *again* for values a, b, and c. 

result4 = my_function(a=1, b=2, c=3)
# We read the cache

在上面的结尾处，我们始终有result4 = result3，但不一定是result4 = result，这就是为什么需要一个选项来强制对相同的输入参数进行缓存更新。

我该如何处理这个问题？

关于`joblib`

的说明

据我所知joblib支持.call，这会强制重播，但does not update the cache。

使用`klepto`：

的后续行动

有没有办法让klepto（请参阅@ Wally的回答）默认情况下在特定位置缓存其结果？（例如/some/path/）并在多个功能中共享此位置？例如。我想说

cache_path = "/some/path/"

然后@memoize在同一路径下的给定模块中的几个函数。

Answer 1

我建议您查看joblib和klepto。两者都有非常可配置的缓存算法，可以做你想要的。

两者绝对可以对result1和result2进行缓存，而klepto可以提供对缓存的访问，因此可以pop来自本地内存缓存的结果（不要将其从存储的存档中删除，比如在数据库中删除。

>>> import klepto
>>> from klepto import lru_cache as memoize
>>> from klepto.keymaps import hashmap
>>> hasher = hashmap(algorithm='md5')
>>> @memoize(keymap=hasher)
... def squared(x):
...   print("called")
...   return x**2
... 
>>> squared(1)
called
1
>>> squared(2)
called
4
>>> squared(3)
called
9
>>> squared(2)
4
>>> 
>>> cache = squared.__cache__()
>>> # delete the 'key' for x=2
>>> cache.pop(squared.key(2))
4
>>> squared(2)
called
4

不完全是您正在寻找的关键字界面，但它具有您正在寻找的功能。

Answer 2

您可以这样做：

import cPickle


def memoize(func):
    cache = {}

    def decorator(*args, **kwargs):
        force_update = kwargs.pop('force_update', None)
        key = cPickle.dumps((args, kwargs))
        if force_update or key not in cache:
            res = func(*args, **kwargs)
            cache[key] = res
        else:
            res = cache[key]
        return res
    return decorator

装饰器接受额外的参数force_update（您不需要在函数中声明它）。它从kwargs弹出。所以你没有使用这些参数调用函数，或者你正在调用函数force_update = True：

@memoize
def f(a=0, b=0, c=0):
    import random
    return [a, b, c, random.randint(1, 10)]


>>> print f(a=1, b=2, c=3)
[1, 2, 3, 9]
>>> print f(a=1, b=2, c=3) # value will be taken from the cache
[1, 2, 3, 9]
>>> print f(a=1, b=2, c=3, force_update=True)
[1, 2, 3, 2]
>>> print f(a=1, b=2, c=3) # value will be taken from the cache as well
[1, 2, 3, 2]

Answer 3

如果你想自己做：

def memoize(func):
    cache = {}
    def cacher(a, b, c, force_update=False):
        if force_update or (a, b, c) not in cache:
            cache[(a, b, c)] = func(a, b, c)
        return cache[(a, b, c)]
    return cacher

Answer 4

这纯粹是针对klepto ...

的后续问题

流动将扩展@ Wally的示例以指定目录：

>>> import klepto
>>> from klepto import lru_cache as memoize
>>> from klepto.keymaps import hashmap
>>> from klepto.archives import dir_archive
>>> hasher = hashmap(algorithm='md5')
>>> dir_cache = dir_archive('/tmp/some/path/squared')
>>> dir_cache2 = dir_archive('/tmp/some/path/tripled')
>>> @memoize(keymap=hasher, cache=dir_cache)
... def squared(x):
...   print("called")
...   return x**2
>>> 
>>> @memoize(keymap=hasher, cache=dir_cache2)
... def tripled(x):
...   print('called')
...   return 3*x
>>>

您可以选择使用file_archive，将路径指定为：

cache = file_archive('/tmp/some/path/file.py')

用于选择性缓存/ memoization的装饰器

关于`joblib`

使用`klepto`：

4 个答案:

用于选择性缓存/ memoization的装饰器

关于joblib

使用klepto：

4 个答案:

关于`joblib`

使用`klepto`：