Python文件缓存

时间:2012-03-24 19:11:17

标签: python file caching

我正在从文件创建一些对象(来自模板xsd文件的验证器,在其他情况下将其他xsd文件一起绘制),并且我想在磁盘上的文件发生更改时重新创建对象。

我可以创建类似的东西:

def getobj(fname, cache = {}):
    try:
        obj, lastloaded = cache[fname]
        if lastloaded < last_time_written(fname):
           # same stuff as in except clause
    except KeyError:
        obj = create_from_file(fname)
        cache[fname] = (obj, currenttime)

    return obj

但是,如果存在,我宁愿使用别人测试的代码。是否有现有的库可以做这样的事情?

更新:我正在使用python 2.7.1。

3 个答案:

答案 0 :(得分:3)

您的代码(包括缓存逻辑)看起来很好。

考虑在函数定义之外移动 cache 变量。这样就可以添加其他功能来清除或检查缓存。

如果您想查看类似的代码,请查看filecmp模块的来源:http://hg.python.org/cpython/file/2.7/Lib/filecmp.py有趣的部分是如何使用stat module来确定是否文件已更改。这是签名功能:

def _sig(st):
    return (stat.S_IFMT(st.st_mode),
            st.st_size,
            st.st_mtime)

答案 1 :(得分:1)

除非有特定的理由将其用作参数,否则我将使用缓存作为全局对象

答案 2 :(得分:1)

三个想法。

  1. 使用try... except... else进行更整洁的控制流程。

  2. 文件修改时间非常不稳定 - 特别是,它们不一定与文件修改的最近时间相对应!

  3. Python 3包含一个缓存装饰器:functools.lru_cache。这是来源。

    def lru_cache(maxsize=100):
        """Least-recently-used cache decorator.
    
        If *maxsize* is set to None, the LRU features are disabled and the cache
        can grow without bound.
    
        Arguments to the cached function must be hashable.
    
        View the cache statistics named tuple (hits, misses, maxsize, currsize) with
        f.cache_info().  Clear the cache and statistics with f.cache_clear().
        Access the underlying function with f.__wrapped__.
    
        See:  http://en.wikipedia.org/wiki/Cache_algorithms#Least_Recently_Used
    
        """
        # Users should only access the lru_cache through its public API:
        #       cache_info, cache_clear, and f.__wrapped__
        # The internals of the lru_cache are encapsulated for thread safety and
        # to allow the implementation to change (including a possible C version).
    
        def decorating_function(user_function,
                    tuple=tuple, sorted=sorted, len=len, KeyError=KeyError):
    
            hits = misses = 0
            kwd_mark = (object(),)          # separates positional and keyword args
            lock = Lock()                   # needed because ordereddicts aren't threadsafe
    
            if maxsize is None:
                cache = dict()              # simple cache without ordering or size limit
    
                @wraps(user_function)
                def wrapper(*args, **kwds):
                    nonlocal hits, misses
                    key = args
                    if kwds:
                        key += kwd_mark + tuple(sorted(kwds.items()))
                    try:
                        result = cache[key]
                        hits += 1
                    except KeyError:
                        result = user_function(*args, **kwds)
                        cache[key] = result
                        misses += 1
                    return result
            else:
                cache = OrderedDict()       # ordered least recent to most recent
                cache_popitem = cache.popitem
                cache_renew = cache.move_to_end
    
                @wraps(user_function)
                def wrapper(*args, **kwds):
                    nonlocal hits, misses
                    key = args
                    if kwds:
                        key += kwd_mark + tuple(sorted(kwds.items()))
                    try:
                        with lock:
                            result = cache[key]
                            cache_renew(key)        # record recent use of this key
                            hits += 1
                    except KeyError:
                        result = user_function(*args, **kwds)
                        with lock:
                            cache[key] = result     # record recent use of this key
                            misses += 1
                            if len(cache) > maxsize:
                                cache_popitem(0)    # purge least recently used cache entry
                    return result
    
            def cache_info():
                """Report cache statistics"""
                with lock:
                    return _CacheInfo(hits, misses, maxsize, len(cache))
    
            def cache_clear():
                """Clear the cache and cache statistics"""
                nonlocal hits, misses
                with lock:
                    cache.clear()
                    hits = misses = 0
    
            wrapper.cache_info = cache_info
            wrapper.cache_clear = cache_clear
            return wrapper
    
        return decorating_function