我使用ZODB BTrees存储一些数据(每个文件约1-2GB)。我使用:
创建了ZODB文件storage = FileStorage(os.path.join(zodb_dir, 'trainPatches.fs'))
db = DB(storage)
connection = db.open()
root = connection.root()
root.patches = BTrees.OOBTree.BTree()
# take columns from a pandas data frame and insert into zodb
for col in df.columns:
root.patches[col] = df[col]
transaction.commit()
db.close()
connection.close()
现在我正在阅读它们(并转换回数据帧):
db = DB(FileStorage(file_name))
connection = db.open()
root = connection.root()
# this is the key to the BTree that the files we're saved as
pp = root.patches
p_list = list(pp.items())
# from ZODB is stored as a list of tuples, need to separate out names and data
column_names = [names[0] for names in p_list]
data_ = [names[1] for names in p_list]
# insert into data frame
patches_df = pd.DataFrame()
for i, c in enumerate(column_names):
patches_df[c] = data_[i]
connection.close()
db.pack()
db.close()
但在阅读了其中约30个之后,该过程在终端被杀死。我是使用ZODB的新手,但是这个写/读序列可能会占用内存?