有没有办法将函数的输出记忆到磁盘上?
我有一个功能
def getHtmlOfUrl(url):
... # expensive computation
并希望做类似的事情:
def getHtmlMemoized(url) = memoizeToFile(getHtmlOfUrl, "file.dat")
然后调用getHtmlMemoized(url),以便为每个url只执行一次昂贵的计算。
答案 0 :(得分:24)
Python提供了一种非常优雅的方法 - 装饰器。基本上,装饰器是一个函数,它包含另一个函数以提供附加功能而无需更改函数源代码。您的装饰者可以这样写:
import json
def persist_to_file(file_name):
def decorator(original_func):
try:
cache = json.load(open(file_name, 'r'))
except (IOError, ValueError):
cache = {}
def new_func(param):
if param not in cache:
cache[param] = original_func(param)
json.dump(cache, open(file_name, 'w'))
return cache[param]
return new_func
return decorator
一旦你有了这个,使用@ -syntax'装饰'这个功能就可以了。
@persist_to_file('cache.dat')
def html_of_url(url):
your function code...
请注意,此装饰器是有意简化的,可能不适用于所有情况,例如,当源函数接受或返回无法json序列化的数据时。
有关装饰器的更多信息:How to make a chain of function decorators?
以下是如何让装饰者在退出时只保存一次缓存:
import json, atexit
def persist_to_file(file_name):
try:
cache = json.load(open(file_name, 'r'))
except (IOError, ValueError):
cache = {}
atexit.register(lambda: json.dump(cache, open(file_name, 'w')))
def decorator(func):
def new_func(param):
if param not in cache:
cache[param] = func(param)
return cache[param]
return new_func
return decorator
答案 1 :(得分:14)
结帐joblib.Memory
。它是一个完全相同的库。
答案 2 :(得分:3)
由Python的Shelve模块驱动的清洁解决方案。优点是缓存通过众所周知的dict
语法实时更新,也是异常证明(无需处理恼人的KeyError
)。
import shelve
def shelve_it(file_name):
d = shelve.open(file_name)
def decorator(func):
def new_func(param):
if param not in d:
d[param] = func(param)
return d[param]
return new_func
return decorator
@shelve_it('cache.shelve')
def expensive_funcion(param):
pass
这将有助于只计算一次函数。接下来的后续调用将返回存储的结果。
答案 3 :(得分:0)
这样的事情应该做:
import json
class Memoize(object):
def __init__(self, func):
self.func = func
self.memo = {}
def load_memo(filename):
with open(filename) as f:
self.memo.update(json.load(f))
def save_memo(filename):
with open(filename, 'w') as f:
json.dump(self.memo, f)
def __call__(self, *args):
if not args in self.memo:
self.memo[args] = self.func(*args)
return self.memo[args]
基本用法:
your_mem_func = Memoize(your_func)
your_mem_func.load_memo('yourdata.json')
# do your stuff with your_mem_func
如果您想在使用后将“缓存”写入文件 - 以后再次加载:
your_mem_func.save_memo('yournewdata.json')
答案 4 :(得分:0)
Artemis library有一个模块。 (你需要from artemis.fileman.disk_memoize import memoize_to_disk
@memoize_to_disk
def fcn(a, b, c = None):
results = ...
return results
)
你装饰你的功能:
<action path="/informacionRamoCristales" type="self.tiles.actions.TestAction" parameter="method">
<forward name="success" path="tutorial"/>
</action>
在内部,它使用输入参数创建哈希,并通过此哈希保存备忘录文件。
答案 5 :(得分:0)
假设您的数据是json可序列化的,此代码应该可以正常工作
data.table::merge.data.table()
装饰import os, json
def json_file(fname):
def decorator(function):
def wrapper(*args, **kwargs):
if os.path.isfile(fname):
with open(fname, 'r') as f:
ret = json.load(f)
else:
with open(fname, 'w') as f:
ret = function(*args, **kwargs)
json.dump(ret, f)
return ret
return wrapper
return decorator
,然后只需调用它,如果之前已经运行过,您将获得缓存数据。
使用python 2.x和python 3.x
进行检查答案 6 :(得分:0)
还有diskcache
。
from diskcache import Cache
cache = Cache("cachedir")
@cache.memoize()
def f(x, y):
print('Running f({}, {})'.format(x, y))
return x, y
答案 7 :(得分:0)
大多数答案都以装饰器的方式出现。但是也许我不想每次调用函数时都缓存结果。
我使用上下文管理器提出了一个解决方案,因此该函数可以称为
with DiskCacher('cache_id', myfunc) as myfunc2:
res=myfunc2(...)
需要缓存功能时。
“ cache_id”字符串用于区分名为[calling_script]_[cache_id].dat
的数据文件。因此,如果您要循环执行此操作,则需要将循环变量合并到此cache_id
中,否则数据将被覆盖。
或者:
myfunc2=DiskCacher('cache_id')(myfunc)
res=myfunc2(...)
或者(这可能不太有用,因为一直使用相同的ID):
@DiskCacher('cache_id')
def myfunc(*args):
...
带有示例的完整代码(我使用pickle
进行保存/加载,但是可以更改为任何保存/读取方法。请注意,这也假设所讨论的函数仅返回1个返回值) :
from __future__ import print_function
import sys, os
import functools
def formFilename(folder, varid):
'''Compose abspath for cache file
Args:
folder (str): cache folder path.
varid (str): variable id to form file name and used as variable id.
Returns:
abpath (str): abspath for cache file, which is using the <folder>
as folder. The file name is the format:
[script_file]_[varid].dat
'''
script_file=os.path.splitext(sys.argv[0])[0]
name='[%s]_[%s].nc' %(script_file, varid)
abpath=os.path.join(folder, name)
return abpath
def readCache(folder, varid, verbose=True):
'''Read cached data
Args:
folder (str): cache folder path.
varid (str): variable id.
Keyword Args:
verbose (bool): whether to print some text info.
Returns:
results (tuple): a tuple containing data read in from cached file(s).
'''
import pickle
abpath_in=formFilename(folder, varid)
if os.path.exists(abpath_in):
if verbose:
print('\n# <readCache>: Read in variable', varid,
'from disk cache:\n', abpath_in)
with open(abpath_in, 'rb') as fin:
results=pickle.load(fin)
return results
def writeCache(results, folder, varid, verbose=True):
'''Write data to disk cache
Args:
results (tuple): a tuple containing data read to cache.
folder (str): cache folder path.
varid (str): variable id.
Keyword Args:
verbose (bool): whether to print some text info.
'''
import pickle
abpath_out=formFilename(folder, varid)
if verbose:
print('\n# <writeCache>: Saving output to:\n',abpath_out)
with open(abpath_out, 'wb') as fout:
pickle.dump(results, fout)
return
class DiskCacher(object):
def __init__(self, varid, func=None, folder=None, overwrite=False,
verbose=True):
'''Disk cache context manager
Args:
varid (str): string id used to save cache.
function <func> is assumed to return only 1 return value.
Keyword Args:
func (callable): function object whose return values are to be
cached.
folder (str or None): cache folder path. If None, use a default.
overwrite (bool): whether to force a new computation or not.
verbose (bool): whether to print some text info.
'''
if folder is None:
self.folder='/tmp/cache/'
else:
self.folder=folder
self.func=func
self.varid=varid
self.overwrite=overwrite
self.verbose=verbose
def __enter__(self):
if self.func is None:
raise Exception("Need to provide a callable function to __init__() when used as context manager.")
return _Cache2Disk(self.func, self.varid, self.folder,
self.overwrite, self.verbose)
def __exit__(self, type, value, traceback):
return
def __call__(self, func=None):
_func=func or self.func
return _Cache2Disk(_func, self.varid, self.folder, self.overwrite,
self.verbose)
def _Cache2Disk(func, varid, folder, overwrite, verbose):
'''Inner decorator function
Args:
func (callable): function object whose return values are to be
cached.
varid (str): variable id.
folder (str): cache folder path.
overwrite (bool): whether to force a new computation or not.
verbose (bool): whether to print some text info.
Returns:
decorated function: if cache exists, the function is <readCache>
which will read cached data from disk. If needs to recompute,
the function is wrapped that the return values are saved to disk
before returning.
'''
def decorator_func(func):
abpath_in=formFilename(folder, varid)
@functools.wraps(func)
def wrapper(*args, **kwargs):
if os.path.exists(abpath_in) and not overwrite:
results=readCache(folder, varid, verbose)
else:
results=func(*args, **kwargs)
if not os.path.exists(folder):
os.makedirs(folder)
writeCache(results, folder, varid, verbose)
return results
return wrapper
return decorator_func(func)
if __name__=='__main__':
data=range(10) # dummy data
#--------------Use as context manager--------------
def func1(data, n):
'''dummy function'''
results=[i*n for i in data]
return results
print('\n### Context manager, 1st time call')
with DiskCacher('context_mananger', func1) as func1b:
res=func1b(data, 10)
print('res =', res)
print('\n### Context manager, 2nd time call')
with DiskCacher('context_mananger', func1) as func1b:
res=func1b(data, 10)
print('res =', res)
print('\n### Context manager, 3rd time call with overwrite=True')
with DiskCacher('context_mananger', func1, overwrite=True) as func1b:
res=func1b(data, 10)
print('res =', res)
#--------------Return a new function--------------
def func2(data, n):
results=[i*n for i in data]
return results
print('\n### Wrap a new function, 1st time call')
func2b=DiskCacher('new_func')(func2)
res=func2b(data, 10)
print('res =', res)
print('\n### Wrap a new function, 2nd time call')
res=func2b(data, 10)
print('res =', res)
#----Decorate a function using the syntax sugar----
@DiskCacher('pie_dec')
def func3(data, n):
results=[i*n for i in data]
return results
print('\n### pie decorator, 1st time call')
res=func3(data, 10)
print('res =', res)
print('\n### pie decorator, 2nd time call.')
res=func3(data, 10)
print('res =', res)
输出:
### Context manager, 1st time call
# <writeCache>: Saving output to:
/tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### Context manager, 2nd time call
# <readCache>: Read in variable context_mananger from disk cache:
/tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### Context manager, 3rd time call with overwrite=True
# <writeCache>: Saving output to:
/tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### Wrap a new function, 1st time call
# <writeCache>: Saving output to:
/tmp/cache/[diskcache]_[new_func].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### Wrap a new function, 2nd time call
# <readCache>: Read in variable new_func from disk cache:
/tmp/cache/[diskcache]_[new_func].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### pie decorator, 1st time call
# <writeCache>: Saving output to:
/tmp/cache/[diskcache]_[pie_dec].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### pie decorator, 2nd time call.
# <readCache>: Read in variable pie_dec from disk cache:
/tmp/cache/[diskcache]_[pie_dec].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
答案 8 :(得分:-1)
您可以使用cache_to_disk软件包:
from cache_to_disk import cache_to_disk
@cache_to_disk(3)
def my_func(a, b, c, d=None):
results = ...
return results
这会将结果缓存3天,具体取决于参数a,b,c和d。结果存储在您计算机上的pickle文件中,并在下次调用该函数时被取消选中并返回。 3天后,将删除pickle文件,直到重新运行该功能。每当使用新参数调用该函数时,该函数将重新运行。此处更多信息:https://github.com/sarenehan/cache_to_disk