我正在尝试运行一堆(电源系统)仿真并将所有结果保存到词典中。这是数据组织:
由于我没有那么复杂的对象结构,所以我决定使用莳萝来存储包含一堆字典(每个字典的键都包含一个类)的字典
import dill as pickle
class Results():
def __init__(self):
self.volt = []
self.angle = []
self.freq = []
def save_obj(obj, name ):
# save as pickle object
currentdir = os.getcwd()
objDir = currentdir + '/obj'
if not os.path.isdir(objDir):
os.mkdir(objDir)
with open(objDir+ '/' + name + '.pkl', 'wb') as f:
pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL,recurse = 'True')
EventDict = {}
########### conceptual code to get all the data
# simList is a list of approximately 7200 events
for event in simList:
ResultsDict = {}
for element in network: # 24 elements in network (23 buses,or nodes, and time)
# code to get voltage, angle and frequency (each of which is a list of 1200 elements)
if element == 'time':
ResultsDict['time'] = element
else:
ResultsDict[element] = Results()
ResultsDict[element].volt = element.volt
ResultsDict[element].angle = element.angle
ResultsDict[element].freq = element.freq
EventDict[event] = ResultsDict
save_obj(EventDict,'EventData')
最终的泡菜对象大约是5个演出,当我尝试加载时,出现以下错误,提示它内存不足:
Traceback (most recent call last):
File "combineEventPkl.py", line 39, in <module>
EventDict = load_obj(objStr)
File "combineEventPkl.py", line 8, in load_obj
return pickle.load(f)
File "C:\Python27\lib\site-packages\dill\_dill.py", line 304, in load
obj = pik.load()
File "C:\Python27\lib\pickle.py", line 864, in load
dispatch[key](self)
File "C:\Python27\lib\pickle.py", line 964, in load_binfloat
self.append(unpack('>d', self.read(8))[0])
MemoryError
no mem for new parser
MemoryError
此外,在我获得此回溯之前,进行解腌需要很长时间。 我意识到这个问题是因为EventDict很大。 因此,我想我想问的是,是否存在一种更好的方法来存储此类时间序列数据,并具有使用键标记每个数据的功能,以便我知道它代表什么?我愿意接受除pickle之外的其他建议,只要它可以快速加载并且在加载到python中不会花费太多精力。
答案 0 :(得分:0)
检出“使用PyStore进行熊猫时间序列数据的快速数据存储” https://medium.com/@aroussi/fast-data-store-for-pandas-time-series-data-using-pystore-89d9caeef4e2
可能需要在读入数据时对数据进行分块。https://cmdlinetips.com/2018/01/how-to-load-a-massive-file-as-small-chunks-in-pandas/