我有几百万条记录,我想要经常存储,检索和删除。这些记录中的每一个都有一个“键”,但“值”不容易翻译成字典,因为它是从我没有编写的模块方法返回的任意Python对象(我理解很多分层数据结构)像json
更好地作为词典工作,并且不确定json
在任何情况下是否是首选数据库)。
我想在一个单独的文件中挑选每个条目。还有更好的方法吗?
答案 0 :(得分:3)
使用shelve
模块。
您可以将它用作字典,就像在json
中一样,但它使用pickle存储对象。
来自python官方文档:
import shelve
d = shelve.open(filename) # open -- file may get suffix added by low-level
# library
d[key] = data # store data at key (overwrites old data if
# using an existing key)
data = d[key] # retrieve a COPY of data at key (raise KeyError if no
# such key)
del d[key] # delete data stored at key (raises KeyError
# if no such key)
flag = d.has_key(key) # true if the key exists
klist = d.keys() # a list of all existing keys (slow!)
# as d was opened WITHOUT writeback=True, beware:
d['xx'] = range(4) # this works as expected, but...
d['xx'].append(5) # *this doesn't!* -- d['xx'] is STILL range(4)!
# having opened d without writeback=True, you need to code carefully:
temp = d['xx'] # extracts the copy
temp.append(5) # mutates the copy
d['xx'] = temp # stores the copy right back, to persist it
# or, d=shelve.open(filename,writeback=True) would let you just code
# d['xx'].append(5) and have it work as expected, BUT it would also
# consume more memory and make the d.close() operation slower.
d.close() # close it
答案 1 :(得分:1)
我会评估像berkeleydb,kyoto cabinet或其他人这样的键/值数据库的使用。这将为您提供所有花哨的东西以及更好的磁盘空间处理。在块大小为4096B的文件系统中,无论对象的大小如何,一百万个文件占用~4GB(作为下限,如果对象大于4096B,则大小增加)。