Question

我有此代码：

import resource
from copy import deepcopy

import pandas as pd
import random

n=100000
df = pd.DataFrame({'x': [random.randint(1, 1000) for i in range(n)],
                   'y': [random.randint(1, 1000) for i in range(n)],
                   'z': [random.randint(1, 1000) for i in range(n)]
                   })
df2=deepcopy(df)
print 'memory peak: {kb} kb'.format(kb=resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)

我希望在df2=deepcopy(df)行和没有行的情况下，此代码会看到不同的内存峰值用法。但是内存显示的结果完全相同。 deepcopy是否应该克隆对象并因此增加内存使用量？

Answer 1

在对象copy.deepcopy上调用foo时，将在其__dict__上查找__deepcopy__方法，该方法依次被调用。对于DataFrame实例，它扩展了NDFrame类，该类的__deepcopy__方法将操作委托给NDFrame.copy。

这里是the implementation of NDFrame.__deepcopy__：

def __deepcopy__(self, memo=None):
    """
    Parameters
    ----------
    memo, default None
        Standard signature. Unused
    """
    if memo is None:
        memo = {}
    return self.copy(deep=True)

您可以阅读NDFrame.copy的文档字符串here，但这是主要摘录：

def copy(self, deep=True):
    """
    Make a copy of this object's indices and data.
    When ``deep=True`` (default), a new object will be created with a
    copy of the calling object's data and indices. Modifications to
    the data or indices of the copy will not be reflected in the
    original object (see notes below).
    (...)
    Notes
    -----
    When ``deep=True``, data is copied but actual Python objects
    will not be copied recursively, only the reference to the object.
    This is in contrast to `copy.deepcopy` in the Standard Library,
    which recursively copies object data (see examples below).

在您的示例中，数据是Python列表，因此实际上并未复制。

为什么对Pandas DataFrame进行深度复制不会影响内存使用？

1 个答案: