Question

我可能需要在大型代码库上使用不同的工具进行多次读取。然后我认为在磁盘上读取这么多次是非常浪费的，而文本不会改变，所以我写了以下内容。

class Module(object):
    def __init__(self, module_path):
        self.module_path = module_path
        self._text = None
        self._ast = None

    @property
    def text(self):
        if not self._text:
            self._text = open(self.module_path).read()
        return self._text

    @property
    def ast(self):
        s = self.text  # which is actually discarded
        if not self._ast:
            self._ast = parse(self.text)

        return self._ast


class ContentDirectory(object):
    def __init__(self):
        self.content = {}

    def __getitem__(self, module_path):
        if module_path not in self.content:
            self.content[module_path] = Module(module_path)

        return self.content[module_path]

但现在问题来了，因为我想避免更改其余的代码，同时能够使用这个新技巧。

我看到的唯一方法是在可能使用的任何地方修补“open”内置函数。例如。

from myotherlib import __builtins__ as other_builtins
other_builtins.open = my_dummy_open  # which uses this cache

但这似乎并不是一个明智的想法。我应该放弃，只有尝试表现真的太糟糕了吗？

Answer 1

更换系统open()电话可能是一件坏事。它要求使用open()的所有内容都按预期使用它。

为什么要避免更改代码？

是的，衡量效果，看看是否值得。例如，输入上面的代码，看看事情的速度有多快。如果它只快1％，那么没有理由做任何事情。如果它明显更快，那么请查看使用open()的内容并在可能的情况下更改该代码。

BTW，像LRU cache（Python 3.2中的functools的一部分）也会对你的任务有所帮助。

Answer 2

您可以使用mmap模块：http://docs.python.org/library/mmap.html

Answer 3

我不确定这个库提供的功能在你的场景中是否有用，但是考虑提到linecache library的存在。来自链接的文档：

linecache模块允许从任何文件获取任何行，同时尝试使用缓存进行内部优化，这是从单个文件中读取许多行的常见情况。

...当然，这与您以优雅透明的方式实施解决方案的问题相差甚远......

缓存磁盘操作

3 个答案: