Question

我有一个具有确定性结果的python函数。运行并生成大量输出需要很长时间：

def time_consuming_function():
    # lots_of_computing_time to come up with the_result
    return the_result

我不时修改time_consuming_function，但我希望避免在它不变的情况下再次运行。 [time_consuming_function仅取决于为此目的考虑不可变的函数;也就是说，它可能有来自Python库的函数，但不能来自我改变的其他代码片段。]向我提出的解决方案是缓存输出并缓存函数的一些“哈希”。如果哈希值发生变化，则该函数将被修改，我们必须重新生成输出。

这可能还是荒谬？

更新：根据答案，看起来我想要做的是“memoize”time_consuming_function，除了代替（或除了）传递给不变量的参数功能，我想说明一个本身会改变的功能。

Answer 1

如果我理解你的问题，我想我会像这样解决它。这是一个触动的邪恶，但我认为它比我在这里看到的其他解决方案更可靠和更有针对性。

import inspect
import functools
import json

def memoize_zeroadic_function_to_disk(memo_filename):
    def decorator(f):
        try:
            with open(memo_filename, 'r') as fp:
                cache = json.load(fp)
        except IOError:
            # file doesn't exist yet
            cache = {}

        source = inspect.getsource(f)

        @functools.wraps(f)
        def wrapper():
            if source not in cache:
                cache[source] = f()
                with open(memo_filename, 'w') as fp:
                    json.dump(cache, fp)

            return cache[source]
        return wrapper
    return decorator

@memoize_zeroadic_function_to_disk(...SOME PATH HERE...)
def time_consuming_function():
    # lots_of_computing_time to come up with the_result
    return the_result

Answer 2

我不是将函数放在字符串中，而是将函数放在自己的文件中。例如，将其称为time_consuming.py。它看起来像这样：

def time_consuming_method():
   # your existing method here

# Is the cached data older than this file?
if (not os.path.exists(data_file_name) 
    or os.stat(data_file_name).st_mtime < os.stat(__file__).st_mtime):
    data = time_consuming_method()
    save_data(data_file_name, data)
else:
    data = load_data(data_file_name)

# redefine method
def time_consuming_method():
    return data

在测试基础设施以实现此功能时，我会评论出缓慢的部分。创建一个只返回0的简单函数，让所有的保存/加载工作满意，然后将慢速位置放回去。

Answer 3

所以，这是使用装饰器的一个非常巧妙的技巧：

def memoize(f):
    cache={};
    def result(*args):
        if args not in cache:
           cache[args]=f(*args);
        return cache[args];
    return result;

通过上述内容，您可以使用：

@memoize
def myfunc(x,y,z):
   # Some really long running computation

当你调用myfunc时，你实际上会调用它的memoized版本。挺整洁的，对吧？无论何时想要重新定义函数，只需再次使用“@memoize”，或者明确地写：

myfunc = memoize(new_definition_for_myfunc);

修改的
我没有意识到你想在多次运行之间缓存。在这种情况下，您可以执行以下操作：

import os;
import os.path;
import cPickle;

class MemoizedFunction(object):
     def __init__(self,f):
         self.function=f;
         self.filename=str(hash(f))+".cache";
         self.cache={};
         if os.path.exists(self.filename):
             with open(filename,'rb') as file:
                 self.cache=cPickle.load(file);

     def __call__(self,*args):
         if args not in self.cache:
             self.cache[args]=self.function(*args);
         return self.cache[args];

     def __del__(self):
         with open(self.filename,'wb') as file:
              cPickle.dump(self.cache,file,cPickle.HIGHEST_PROTOCOL);

def memoize(f):
    return MemoizedFunction(f);

Answer 4

第一部分是查找表的记忆和序列化。基于一些python序列化库，这应该是直截了当的。第二部分是您希望在源代码更改时删除序列化查找表。也许这被推翻到一些奇特的解决方案中。大概是当你改变代码时，你会在某个地方检查它？为什么不在删除序列化表的checkin例程中添加一个钩子？或者，如果这不是研究数据并且正在生产中，请将其作为发布过程的一部分，如果文件的修订号（将此函数放在其自己的文件中）发生更改，则发布脚本将删除序列化查找表。

Answer 5

您所描述的内容实际上是memoization。可以通过定义装饰器来记忆大多数常用功能。

一个（过于简化）的例子：

def memoized(f):
    cache={}
    def memo(*args):
        if args in cache:
            return cache[args]
        else:
            ret=f(*args)
            cache[args]=ret
            return ret
    return memo

@memoized
def time_consuming_method():
    # lots_of_computing_time to come up with the_result
    return the_result

编辑：

从Mike Graham的评论和OP的更新中，现在很清楚，需要在程序的不同运行中缓存值。这可以通过使用缓存的一些持久存储来完成（例如，使用Pickle或简单文本文件这样简单的东西，或者可能使用完整的数据库，或者介于两者之间的任何东西）。选择使用哪种方法取决于OP需要什么。其他几个答案已经给出了一些解决方案，所以我不打算在此重复。

散列python函数以在修改函数时重新生成输出

5 个答案: