为什么对Pandas DataFrame进行深度复制不会影响内存使用?

时间:2019-04-18 12:22:11

标签: python pandas

我有此代码:

import resource
from copy import deepcopy

import pandas as pd
import random

n=100000
df = pd.DataFrame({'x': [random.randint(1, 1000) for i in range(n)],
                   'y': [random.randint(1, 1000) for i in range(n)],
                   'z': [random.randint(1, 1000) for i in range(n)]
                   })
df2=deepcopy(df)
print 'memory peak: {kb} kb'.format(kb=resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)

我希望在df2=deepcopy(df) 行和没有行的情况下,此代码会看到不同的内存峰值用法。但是内存显示的结果完全相同。 deepcopy是否应该克隆对象并因此增加内存使用量?

1 个答案:

答案 0 :(得分:3)

在对象copy.deepcopy上调用foo时,将在其__dict__上查找__deepcopy__方法,该方法依次被调用。 对于DataFrame实例,它扩展了NDFrame类,该类的__deepcopy__方法将操作委托给NDFrame.copy

这里是the implementation of NDFrame.__deepcopy__

def __deepcopy__(self, memo=None):
    """
    Parameters
    ----------
    memo, default None
        Standard signature. Unused
    """
    if memo is None:
        memo = {}
    return self.copy(deep=True)

您可以阅读NDFrame.copy的文档字符串here,但这是主要摘录:

def copy(self, deep=True):
    """
    Make a copy of this object's indices and data.
    When ``deep=True`` (default), a new object will be created with a
    copy of the calling object's data and indices. Modifications to
    the data or indices of the copy will not be reflected in the
    original object (see notes below).
    (...)
    Notes
    -----
    When ``deep=True``, data is copied but actual Python objects
    will not be copied recursively, only the reference to the object.
    This is in contrast to `copy.deepcopy` in the Standard Library,
    which recursively copies object data (see examples below).

在您的示例中,数据是Python列表,因此实际上并未复制。