Question

我如何使用functools＆＃39;内部的lru_cache没有泄漏内存？在下面的最小示例中，foo实例虽然超出范围且没有引用者（除了lru_cache），但不会被释放。

from functools import lru_cache
class BigClass:
    pass
class Foo:
    def __init__(self):
        self.big = BigClass()
    @lru_cache(maxsize=16)
    def cached_method(self, x):
        return x + 5

def fun():
    foo = Foo()
    print(foo.cached_method(10))
    print(foo.cached_method(10)) # use cache
    return 'something'

fun()

但是foo因此foo.big（一个BigClass）还活着

import gc; gc.collect()  # collect garbage
len([obj for obj in gc.get_objects() if isinstance(obj, Foo)]) # is 1

这意味着Foo / BigClass实例仍然驻留在内存中。即使删除Foo（del Foo）也不会释放它们。

为什么lru_cache会依赖实例？缓存不使用某些哈希而不是实际对象吗？

在类中使用lru_caches的推荐方法是什么？

我知道两种解决方法： Use per instance caches或make the cache ignore object（可能会导致错误的结果）

Answer 1

这不是最干净的解决方案，但它对程序员来说完全透明：

import functools
import weakref

def memoized_method(*lru_args, **lru_kwargs):
    def decorator(func):
        @functools.wraps(func)
        def wrapped_func(self, *args, **kwargs):
            # We're storing the wrapped method inside the instance. If we had
            # a strong reference to self the instance would never die.
            self_weak = weakref.ref(self)
            @functools.wraps(func)
            @functools.lru_cache(*lru_args, **lru_kwargs)
            def cached_method(*args, **kwargs):
                return func(self_weak(), *args, **kwargs)
            setattr(self, func.__name__, cached_method)
            return cached_method(*args, **kwargs)
        return wrapped_func
    return decorator

它采用与lru_cache完全相同的参数，并且完全相同。但是，它永远不会将self传递给lru_cache，而是使用每个实例lru_cache。

Answer 2

在此用例中，我将介绍methodtools。

pip install methodtools安装https://pypi.org/project/methodtools/

然后，只需将functools替换为methodtools，您的代码就会起作用。

from methodtools import lru_cache
class Foo:
    @lru_cache(maxsize=16)
    def cached_method(self, x):
        return x + 5

当然，gc测试也会返回0。

Answer 3

python 3.8在functools模块中引入了cached_property装饰器。经过测试时，它似乎不保留实例。

如果您不想更新到python 3.8，则可以使用source code。您只需要导入RLock并创建_NOT_FOUND对象。含义：

from threading import RLock

_NOT_FOUND = object()

class cached_property:
    # https://github.com/python/cpython/blob/master/Lib/functools.py#L913
    ...

Answer 4

简单的包装器解决方案

这是一个包装器，它将保持对实例的弱引用：

import functools
import weakref

def weak_lru(maxsize=128, typed=False):
    'LRU Cache decorator that keeps a weak reference to "self"'
    def wrapper(func):

        @functools.lru_cache(maxsize, typed)
        def _func(_self, *args, **kwargs):
            return func(_self(), *args, **kwargs)

        @functools.wraps(func)
        def inner(self, *args, **kwargs):
            return _func(weakref.ref(self), *args, **kwargs)

        return inner

    return wrapper

示例

像这样使用它：

class Weather:
    "Lookup weather information on a government website"

    def __init__(self, station_id):
        self.station_id = station_id

    @weak_lru(maxsize=10)
    def climate(self, category='average_temperature'):
        print('Simulating a slow method call!')
        return self.station_id + category

何时使用

由于弱引用增加了一些开销，您只希望在实例很大并且应用程序无法等待较旧的未使用调用从缓存中老化时使用它。

为什么这样更好

与其他答案不同，我们只有一个类缓存，而不是每个实例一个。如果您想从最近最少使用的算法中获得一些好处，这一点很重要。使用每个方法的单个缓存，您可以设置 maxsize 以便无论活动实例的数量如何，总内存使用量都是有界的。

处理可变属性

如果方法中使用的任何属性是可变的，请务必添加_eq_()和_hash_() 方法：

class Weather:
    "Lookup weather information on a government website"

    def __init__(self, station_id):
        self.station_id = station_id

    def update_station(station_id):
        self.station_id = station_id

    def __eq__(self, other):
        return self.station_id == other.station_id

    def __hash__(self):
        return hash(self.station_id)

Answer 5

这个问题的一个更简单的解决方案是在构造函数中而不是在类定义中声明缓存：

from functools import lru_cache
import gc

class BigClass:
    pass
class Foo:
    def __init__(self):
        self.big = BigClass()
        self.cached_method = lru_cache(maxsize=16)(self.cached_method)
    def cached_method(self, x):
        return x + 5

def fun():
    foo = Foo()
    print(foo.cached_method(10))
    print(foo.cached_method(10)) # use cache
    return 'something'
    
if __name__ == '__main__':
    fun()
    gc.collect()  # collect garbage
    print(len([obj for obj in gc.get_objects() if isinstance(obj, Foo)]))  # is 0

Python functools lru_cache与类方法：释放对象

5 个答案:

简单的包装器解决方案

示例

何时使用

为什么这样更好

处理可变属性