memoize to disk - python - 持久性memoization

时间:2013-05-09 14:00:05

标签: python memoization

有没有办法将函数的输出记忆到磁盘上?

我有一个功能

def getHtmlOfUrl(url):
    ... # expensive computation

并希望做类似的事情:

def getHtmlMemoized(url) = memoizeToFile(getHtmlOfUrl, "file.dat")

然后调用getHtmlMemoized(url),以便为每个url只执行一次昂贵的计算。

9 个答案:

答案 0 :(得分:24)

Python提供了一种非常优雅的方法 - 装饰器。基本上,装饰器是一个函数,它包含另一个函数以提供附加功能而无需更改函数源代码。您的装饰者可以这样写:

import json

def persist_to_file(file_name):

    def decorator(original_func):

        try:
            cache = json.load(open(file_name, 'r'))
        except (IOError, ValueError):
            cache = {}

        def new_func(param):
            if param not in cache:
                cache[param] = original_func(param)
                json.dump(cache, open(file_name, 'w'))
            return cache[param]

        return new_func

    return decorator

一旦你有了这个,使用@ -syntax'装饰'这个功能就可以了。

@persist_to_file('cache.dat')
def html_of_url(url):
    your function code...

请注意,此装饰器是有意简化的,可能不适用于所有情况,例如,当源函数接受或返回无法json序列化的数据时。

有关装饰器的更多信息:How to make a chain of function decorators?

以下是如何让装饰者在退出时只保存一次缓存:

import json, atexit

def persist_to_file(file_name):

    try:
        cache = json.load(open(file_name, 'r'))
    except (IOError, ValueError):
        cache = {}

    atexit.register(lambda: json.dump(cache, open(file_name, 'w')))

    def decorator(func):
        def new_func(param):
            if param not in cache:
                cache[param] = func(param)
            return cache[param]
        return new_func

    return decorator

答案 1 :(得分:14)

结帐joblib.Memory。它是一个完全相同的库。

答案 2 :(得分:3)

由Python的Shelve模块驱动的清洁解决方案。优点是缓存通过众所周知的dict语法实时更新,也是异常证明(无需处理恼人的KeyError)。

import shelve
def shelve_it(file_name):
    d = shelve.open(file_name)

    def decorator(func):
        def new_func(param):
            if param not in d:
                d[param] = func(param)
            return d[param]

        return new_func

    return decorator

@shelve_it('cache.shelve')
def expensive_funcion(param):
    pass

这将有助于只计算一次函数。接下来的后续调用将返回存储的结果。

答案 3 :(得分:0)

这样的事情应该做:

import json

class Memoize(object):
    def __init__(self, func):
        self.func = func
        self.memo = {}

    def load_memo(filename):
        with open(filename) as f:
            self.memo.update(json.load(f))

    def save_memo(filename):
        with open(filename, 'w') as f:
            json.dump(self.memo, f)

    def __call__(self, *args):
        if not args in self.memo:
            self.memo[args] = self.func(*args)
        return self.memo[args]

基本用法:

your_mem_func = Memoize(your_func)
your_mem_func.load_memo('yourdata.json')
#  do your stuff with your_mem_func

如果您想在使用后将“缓存”写入文件 - 以后再次加载:

your_mem_func.save_memo('yournewdata.json')

答案 4 :(得分:0)

Artemis library有一个模块。 (你需要from artemis.fileman.disk_memoize import memoize_to_disk @memoize_to_disk def fcn(a, b, c = None): results = ... return results

你装饰你的功能:

<action path="/informacionRamoCristales" type="self.tiles.actions.TestAction" parameter="method">
        <forward name="success" path="tutorial"/>       
</action>

在内部,它使用输入参数创建哈希,并通过此哈希保存备忘录文件。

答案 5 :(得分:0)

假设您的数据是json可序列化的,此代码应该可以正常工作

data.table::merge.data.table()

装饰import os, json def json_file(fname): def decorator(function): def wrapper(*args, **kwargs): if os.path.isfile(fname): with open(fname, 'r') as f: ret = json.load(f) else: with open(fname, 'w') as f: ret = function(*args, **kwargs) json.dump(ret, f) return ret return wrapper return decorator ,然后只需调用它,如果之前已经运行过,您将获得缓存数据。

使用python 2.x和python 3.x

进行检查

答案 6 :(得分:0)

还有diskcache

from diskcache import Cache

cache = Cache("cachedir")

@cache.memoize()
def f(x, y):
    print('Running f({}, {})'.format(x, y))
    return x, y

答案 7 :(得分:0)

大多数答案都以装饰器的方式出现。但是也许我不想每次调用函数时都缓存结果。

我使用上下文管理器提出了一个解决方案,因此该函数可以称为

with DiskCacher('cache_id', myfunc) as myfunc2:
    res=myfunc2(...)

需要缓存功能时。

“ cache_id”字符串用于区分名为[calling_script]_[cache_id].dat的数据文件。因此,如果您要循环执行此操作,则需要将循环变量合并到此cache_id中,否则数据将被覆盖。

或者:

myfunc2=DiskCacher('cache_id')(myfunc)
res=myfunc2(...)

或者(这可能不太有用,因为一直使用相同的ID):

@DiskCacher('cache_id')
def myfunc(*args):
    ...

带有示例的完整代码(我使用pickle进行保存/加载,但是可以更改为任何保存/读取方法。请注意,这也假设所讨论的函数仅返回1个返回值) :

from __future__ import print_function
import sys, os
import functools

def formFilename(folder, varid):
    '''Compose abspath for cache file

    Args:
        folder (str): cache folder path.
        varid (str): variable id to form file name and used as variable id.
    Returns:
        abpath (str): abspath for cache file, which is using the <folder>
            as folder. The file name is the format:
                [script_file]_[varid].dat
    '''
    script_file=os.path.splitext(sys.argv[0])[0]
    name='[%s]_[%s].nc' %(script_file, varid)
    abpath=os.path.join(folder, name)

    return abpath


def readCache(folder, varid, verbose=True):
    '''Read cached data

    Args:
        folder (str): cache folder path.
        varid (str): variable id.
    Keyword Args:
        verbose (bool): whether to print some text info.
    Returns:
        results (tuple): a tuple containing data read in from cached file(s).
    '''
    import pickle
    abpath_in=formFilename(folder, varid)
    if os.path.exists(abpath_in):
        if verbose:
            print('\n# <readCache>: Read in variable', varid,
                    'from disk cache:\n', abpath_in)
        with open(abpath_in, 'rb') as fin:
            results=pickle.load(fin)

    return results


def writeCache(results, folder, varid, verbose=True):
    '''Write data to disk cache

    Args:
        results (tuple): a tuple containing data read to cache.
        folder (str): cache folder path.
        varid (str): variable id.
    Keyword Args:
        verbose (bool): whether to print some text info.
    '''
    import pickle
    abpath_out=formFilename(folder, varid)
    if verbose:
        print('\n# <writeCache>: Saving output to:\n',abpath_out)
    with open(abpath_out, 'wb') as fout:
        pickle.dump(results, fout)

    return


class DiskCacher(object):
    def __init__(self, varid, func=None, folder=None, overwrite=False,
            verbose=True):
        '''Disk cache context manager

        Args:
            varid (str): string id used to save cache.
                function <func> is assumed to return only 1 return value.
        Keyword Args:
            func (callable): function object whose return values are to be
                cached.
            folder (str or None): cache folder path. If None, use a default.
            overwrite (bool): whether to force a new computation or not.
            verbose (bool): whether to print some text info.
        '''

        if folder is None:
            self.folder='/tmp/cache/'
        else:
            self.folder=folder

        self.func=func
        self.varid=varid
        self.overwrite=overwrite
        self.verbose=verbose

    def __enter__(self):
        if self.func is None:
            raise Exception("Need to provide a callable function to __init__() when used as context manager.")

        return _Cache2Disk(self.func, self.varid, self.folder,
                self.overwrite, self.verbose)

    def __exit__(self, type, value, traceback):
        return

    def __call__(self, func=None):
        _func=func or self.func
        return _Cache2Disk(_func, self.varid, self.folder, self.overwrite,
                self.verbose)



def _Cache2Disk(func, varid, folder, overwrite, verbose):
    '''Inner decorator function

    Args:
        func (callable): function object whose return values are to be
            cached.
        varid (str): variable id.
        folder (str): cache folder path.
        overwrite (bool): whether to force a new computation or not.
        verbose (bool): whether to print some text info.
    Returns:
        decorated function: if cache exists, the function is <readCache>
            which will read cached data from disk. If needs to recompute,
            the function is wrapped that the return values are saved to disk
            before returning.
    '''

    def decorator_func(func):
        abpath_in=formFilename(folder, varid)

        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            if os.path.exists(abpath_in) and not overwrite:
                results=readCache(folder, varid, verbose)
            else:
                results=func(*args, **kwargs)
                if not os.path.exists(folder):
                    os.makedirs(folder)
                writeCache(results, folder, varid, verbose)
            return results
        return wrapper

    return decorator_func(func)



if __name__=='__main__':

    data=range(10)  # dummy data

    #--------------Use as context manager--------------
    def func1(data, n):
        '''dummy function'''
        results=[i*n for i in data]
        return results

    print('\n### Context manager, 1st time call')
    with DiskCacher('context_mananger', func1) as func1b:
        res=func1b(data, 10)
        print('res =', res)

    print('\n### Context manager, 2nd time call')
    with DiskCacher('context_mananger', func1) as func1b:
        res=func1b(data, 10)
        print('res =', res)

    print('\n### Context manager, 3rd time call with overwrite=True')
    with DiskCacher('context_mananger', func1, overwrite=True) as func1b:
        res=func1b(data, 10)
        print('res =', res)

    #--------------Return a new function--------------
    def func2(data, n):
        results=[i*n for i in data]
        return results

    print('\n### Wrap a new function, 1st time call')
    func2b=DiskCacher('new_func')(func2)
    res=func2b(data, 10)
    print('res =', res)

    print('\n### Wrap a new function, 2nd time call')
    res=func2b(data, 10)
    print('res =', res)

    #----Decorate a function using the syntax sugar----
    @DiskCacher('pie_dec')
    def func3(data, n):
        results=[i*n for i in data]
        return results

    print('\n### pie decorator, 1st time call')
    res=func3(data, 10)
    print('res =', res)

    print('\n### pie decorator, 2nd time call.')
    res=func3(data, 10)
    print('res =', res)

输出:

### Context manager, 1st time call

# <writeCache>: Saving output to:
 /tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

### Context manager, 2nd time call

# <readCache>: Read in variable context_mananger from disk cache:
 /tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

### Context manager, 3rd time call with overwrite=True

# <writeCache>: Saving output to:
 /tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

### Wrap a new function, 1st time call

# <writeCache>: Saving output to:
 /tmp/cache/[diskcache]_[new_func].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

### Wrap a new function, 2nd time call

# <readCache>: Read in variable new_func from disk cache:
 /tmp/cache/[diskcache]_[new_func].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

### pie decorator, 1st time call

# <writeCache>: Saving output to:
 /tmp/cache/[diskcache]_[pie_dec].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

### pie decorator, 2nd time call.

# <readCache>: Read in variable pie_dec from disk cache:
 /tmp/cache/[diskcache]_[pie_dec].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

答案 8 :(得分:-1)

您可以使用cache_to_disk软件包:

    from cache_to_disk import cache_to_disk

    @cache_to_disk(3)
    def my_func(a, b, c, d=None):
        results = ...
        return results

这会将结果缓存3天,具体取决于参数a,b,c和d。结果存储在您计算机上的pickle文件中,并在下次调用该函数时被取消选中并返回。 3天后,将删除pickle文件,直到重新运行该功能。每当使用新参数调用该函数时,该函数将重新运行。此处更多信息:https://github.com/sarenehan/cache_to_disk