Question

我正在尝试运行一堆（电源系统）仿真并将所有结果保存到词典中。这是数据组织：

由于我没有那么复杂的对象结构，所以我决定使用莳萝来存储包含一堆字典（每个字典的键都包含一个类）的字典

import dill as pickle

class Results():
    def __init__(self):
        self.volt = []
        self.angle = []
        self.freq = []

def save_obj(obj, name ):
    # save as pickle object
    currentdir = os.getcwd()
    objDir = currentdir + '/obj'
    if not os.path.isdir(objDir):
        os.mkdir(objDir)
    with open(objDir+ '/' +  name + '.pkl', 'wb') as f:
        pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL,recurse = 'True')




EventDict = {}

########### conceptual code to get all the data
# simList is a list of approximately 7200 events
for event in simList:
    ResultsDict = {}
    for element in network: # 24 elements in network (23 buses,or nodes,  and time)
        # code to get voltage, angle and frequency (each of which is a list of 1200 elements)
        if element == 'time':
            ResultsDict['time'] = element
        else:
            ResultsDict[element] = Results()
            ResultsDict[element].volt = element.volt
            ResultsDict[element].angle = element.angle
            ResultsDict[element].freq = element.freq
    EventDict[event] = ResultsDict


save_obj(EventDict,'EventData')

最终的泡菜对象大约是5个演出，当我尝试加载时，出现以下错误，提示它内存不足：

Traceback (most recent call last):
  File "combineEventPkl.py", line 39, in <module>
    EventDict = load_obj(objStr)
  File "combineEventPkl.py", line 8, in load_obj
    return pickle.load(f)
  File "C:\Python27\lib\site-packages\dill\_dill.py", line 304, in load
    obj = pik.load()
  File "C:\Python27\lib\pickle.py", line 864, in load
    dispatch[key](self)
  File "C:\Python27\lib\pickle.py", line 964, in load_binfloat
    self.append(unpack('>d', self.read(8))[0])
MemoryError
no mem for new parser
MemoryError

此外，在我获得此回溯之前，进行解腌需要很长时间。我意识到这个问题是因为EventDict很大。因此，我想我想问的是，是否存在一种更好的方法来存储此类时间序列数据，并具有使用键标记每个数据的功能，以便我知道它代表什么？我愿意接受除pickle之外的其他建议，只要它可以快速加载并且在加载到python中不会花费太多精力。

Answer 1

检出“使用PyStore进行熊猫时间序列数据的快速数据存储” https://medium.com/@aroussi/fast-data-store-for-pandas-time-series-data-using-pystore-89d9caeef4e2

可能需要在读入数据时对数据进行分块。https://cmdlinetips.com/2018/01/how-to-load-a-massive-file-as-small-chunks-in-pandas/

使用python

1 个答案: