使用python

时间:2018-10-25 20:26:43

标签: python pickle dill

我正在尝试运行一堆(电源系统)仿真并将所有结果保存到词典中。这是数据组织:

由于我没有那么复杂的对象结构,所以我决定使用莳萝来存储包含一堆字典(每个字典的键都包含一个类)的字典

import dill as pickle

class Results():
    def __init__(self):
        self.volt = []
        self.angle = []
        self.freq = []

def save_obj(obj, name ):
    # save as pickle object
    currentdir = os.getcwd()
    objDir = currentdir + '/obj'
    if not os.path.isdir(objDir):
        os.mkdir(objDir)
    with open(objDir+ '/' +  name + '.pkl', 'wb') as f:
        pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL,recurse = 'True')




EventDict = {}

########### conceptual code to get all the data
# simList is a list of approximately 7200 events
for event in simList:
    ResultsDict = {}
    for element in network: # 24 elements in network (23 buses,or nodes,  and time)
        # code to get voltage, angle and frequency (each of which is a list of 1200 elements)
        if element == 'time':
            ResultsDict['time'] = element
        else:
            ResultsDict[element] = Results()
            ResultsDict[element].volt = element.volt
            ResultsDict[element].angle = element.angle
            ResultsDict[element].freq = element.freq
    EventDict[event] = ResultsDict


save_obj(EventDict,'EventData')

最终的泡菜对象大约是5个演出,当我尝试加载时,出现以下错误,提示它内存不足:

Traceback (most recent call last):
  File "combineEventPkl.py", line 39, in <module>
    EventDict = load_obj(objStr)
  File "combineEventPkl.py", line 8, in load_obj
    return pickle.load(f)
  File "C:\Python27\lib\site-packages\dill\_dill.py", line 304, in load
    obj = pik.load()
  File "C:\Python27\lib\pickle.py", line 864, in load
    dispatch[key](self)
  File "C:\Python27\lib\pickle.py", line 964, in load_binfloat
    self.append(unpack('>d', self.read(8))[0])
MemoryError
no mem for new parser
MemoryError

此外,在我获得此回溯之前,进行解腌需要很长时间。 我意识到这个问题是因为EventDict很大。 因此,我想我想问的是,是否存在一种更好的方法来存储此类时间序列数据,并具有使用键标记每个数据的功能,以便我知道它代表什么?我愿意接受除pickle之外的其他建议,只要它可以快速加载并且在加载到python中不会花费太多精力。

1 个答案:

答案 0 :(得分:0)

检出“使用PyStore进行熊猫时间序列数据的快速数据存储” https://medium.com/@aroussi/fast-data-store-for-pandas-time-series-data-using-pystore-89d9caeef4e2

可能需要在读入数据时对数据进行分块。https://cmdlinetips.com/2018/01/how-to-load-a-massive-file-as-small-chunks-in-pandas/