Question

如何从生成器中生成对象并立即将其忘记，以便它不会占用内存？

例如，在以下函数中：

def grouper(iterable, chunksize):
    """
    Return elements from the iterable in `chunksize`-ed lists. The last returned
    element may be smaller (if length of collection is not divisible by `chunksize`).

    >>> print list(grouper(xrange(10), 3))
    [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
    """
    i = iter(iterable)
    while True:
        chunk = list(itertools.islice(i, int(chunksize)))
        if not chunk:
            break
        yield chunk

我不希望函数在产生它之后保持对chunk的引用，因为它不会被进一步使用而只消耗内存，即使所有外部引用都消失了。

编辑：使用python.org的标准Python 2.5 / 2.6 / 2.7。

解决方案（几乎同时由@phihag和@Owen提出）：将结果包装在（小）可变对象中并匿名返回块，只留下小容器：

def chunker(iterable, chunksize):
    """
    Return elements from the iterable in `chunksize`-ed lists. The last returned
    chunk may be smaller (if length of collection is not divisible by `chunksize`).

    >>> print list(chunker(xrange(10), 3))
    [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
    """
    i = iter(iterable)
    while True:
        wrapped_chunk = [list(itertools.islice(i, int(chunksize)))]
        if not wrapped_chunk[0]:
            break
        yield wrapped_chunk.pop()

通过这种内存优化，您现在可以执行以下操作：

 for big_chunk in chunker(some_generator, chunksize=10000):
     ... process big_chunk
     del big_chunk # big_chunk ready to be garbage-collected :-)
     ... do more stuff

Answer 1

在yield chunk之后，变量值永远不会再次在函数中使用，因此一个好的解释器/垃圾收集器已经释放chunk用于垃圾收集（注意：cpython 2.7似乎不是< / em>这样做，pypy 1.6默认gc确实如此）。因此，除了代码示例之外，您不必更改任何内容，因为它缺少grouper的第二个参数。

请注意，垃圾收集在Python中是不确定的。 null 垃圾收集器根本不收集任何自由对象，是一个完全有效的垃圾收集器。来自Python manual：

永远不会明确销毁对象;然而，当他们成为无法到达，他们可能被垃圾收集。一个实现是允许推迟垃圾收集或完全省略 - 这是一个垃圾收集的实施质量问题实现，只要没有收集任何仍然存在的对象可到达的。

因此，如果不指定Python实现和垃圾收集器，就无法确定Python程序是否执行或“不占用内存”。给定特定的Python实现和垃圾收集器，无论对象是否被释放，您都可以使用gc模块test。

话虽如此，如果你真的不想从函数中获取引用（不一定意味着对象将被垃圾收集），这里是如何做到的：

def grouper(iterable, chunksize): i = iter(iterable) while True: tmpr = [list(itertools.islice(i, int(chunksize)))] if not tmpr[0]: break yield tmpr.pop()

除了列表之外，您还可以使用任何其他数据结构，该数据结构具有删除和返回对象的函数，例如Owen's wrapper。

Answer 2

如果你真的想要获得这个功能，我想你可以使用包装器：

class Wrap:

    def __init__(self, val):
        self.val = val

    def unlink(self):
        val = self.val
        self.val = None
        return val

可以像

一样使用

def grouper(iterable, chunksize):
    i = iter(iterable)
    while True:
        chunk = Wrap(list(itertools.islice(i, int(chunksize))))
        if not chunk.val:
            break
        yield chunk.unlink()

这与phihag对pop()所做的基本相同;）

Answer 3

定义的函数grouper具有创建浪费重复项的工件，因为您在itertools.islice周围包含了一个无效的函数。解决方案是删除冗余代码。

我认为C派生语言是非Pythonic的让步，并导致额外的开销。例如，你有

i = iter(iterable)
itertools.islice(i)

为什么i存在？ iter不会将非迭代转换为可迭代，没有这样的转换。如果给定不可迭代，那么这两行都会产生异常;第一个不保护第二个。

islice将很乐意充当迭代器（虽然可能会给经济体yield声明不会。你的代码太多了：grouper可能不需要存在。< / p>

Python：yield-and-delete

4 个答案: