Question

我有一个大字典，其结构如下：

dcPaths = {'id_jola_001': CPath instance}

其中CPath是一个自定义类：

class CPath(object):
    def __init__(self):
        # some attributes
        self.m_dAvgSpeed = 0.0
        ...
        # a list of CNode instance
        self.m_lsNodes = []

其中m_lsNodes是CNode的列表：

class CNode(object):
    def __init__(self):
        # some attributes
        self.m_nLoc = 0

        # a list of Apps
        self.m_lsApps = []

这里，m_lsApps是一个CApp列表，它是另一个自定义类：

class CApp(object):
    def __init__(self):
        # some attributes
        self.m_nCount= 0
        self.m_nUpPackets = 0

我使用cPickle序列化这本词典：

def serialize2File(strFileName, strOutDir, obj):
    if len(obj) != 0:
        strOutFilePath = "%s%s" % (strOutDir, strFileName)
        with open(strOutFilePath, 'w') as hOutFile:
            cPickle.dump(obj, hOutFile, protocol=0)
        return strOutFilePath
    else:
        print("Nothing to serialize!")

它工作正常，序列化文件的大小约为6.8GB。但是，当我尝试反序列化此对象时：

def deserializeFromFile(strFilePath):
    obj = 0
    with open(strFilePath) as hFile:
        obj = cPickle.load(hFile)
    return obj

我发现它消耗的内存超过90GB，需要很长时间。

为什么会这样？
有什么办法可以优化吗？

BTW，我正在使用python 2.7.6

Answer 1

当你存储复杂的python对象时，python通常会存储大量无用的数据（查看__dict__对象属性）。

为了减少非序列化数据的内存消耗，你应该只选择python natives。您可以轻松地在课程中实现某些方法：object.__getstate__()和object.__setstate__(state)。

请参阅有关python文档的Pickling and unpickling normal class instances。

Answer 2

您可以尝试指定pickle协议;最快是-1（意思是：最新的协议，如果你使用相同的Python版本进行酸洗和取消，则没问题。）

cPickle.dump(obj, file, protocol = -1)

<强> 修改：如评论中所述：load检测协议本身。

cPickle.load(obj, file)

python中的cPickle.load（）消耗大量内存

2 个答案: