Question

我曾经做过一些小型的Python程序来进行简单的数据分析。它易于使用且高效。

最近我开始遇到这样的情况：我的问题中的数据大小太大而不能完全适合内存供Python处理。

我一直在研究Python的可能持久性实现。我发现泡菜和其他一些非常有趣但不完全符合我要求的库。

简单地说，pickle处理持久性的方式对程序来说是不透明的。程序员需要明确处理它 - 加载或保存等。

我在考虑是否可以通过更加无缝地编程来实现它。例如，

d1 = p({'k':'v'}) # where p() is the persistent version of dictionary
print d1.has_key('k') # which gives 'v', same as if it is an ordinary dictionary
d2.dump('dict.pkl')  # save the dictionary in a file, or database, etc

也就是说，使用持久版本重载字典方法。它看起来对我来说是可行的，但我需要找出我需要处理多少方法。

浏览Python源代码可能有所帮助，但我还没有真正深入到这个深层次。希望你能为我提供一些指导和方向。

谢谢！

修改

道歉，我在原来的问题中并不十分清楚。我并没有真正研究保存数据结构，而是寻找一些内部＆＃34;分页＆＃34;当我的问题出现内存不足时，可以在场景后面运行的机制。例如，

d1 = p({}) # create a persistent dictionary
d1['k1'] = 'v1' # add
# add another, maybe 1 billion more, entries on to the dictionary
print d1.has_key('k9999999999') # entry that is not in memory

完全落后于现场。程序员不需要保存/加载/搜索。

Answer 1

查看ZODB。 http://www.zodb.org/en/latest

这是一个经过验证的解决方案，具有交易功能。

Answer 2

anydbm几乎与您的示例完全相同，应该相当快。一个问题是它只处理字符串键和字符串内容。我不确定打开和关闭数据库是否过多开销。您可以将它包装在上下文管理器中以使其更好一些。此外，每次调用p时，您都需要使用不同的文件名。

import anydbm
def p(initial):
    d = anydbm.open('cache', 'c')
    d.update(initial)
    return d

d1 = p({}) # create a persistent dictionary
d1['k1'] = 'v1' # add

# add another, maybe 1 billion more, entries on to the dictionary
for i in xrange(100000):
    d1['k{}'.format(i)] = 'v{}'.format(i)

print d1.has_key('k9999999999') # entry that is not in memory, prints False

d1.close() # You have to close it yourself

Answer 3

web2py有一个非常好的数据库抽象层（DAL）用于此类事情，并附带sqlite native，尽管你可以将sqlite换成不同的数据库，比如postgresql等。对于你的情况，sqlite应该足够了。你的例子会这样翻译：

# model goes into one file
# there's some preamble stuff I'm not showing here
db.define_table('p', Field('k'))

# controller goes into separate file
d1 = db.p.insert(k='v')  # this saves k='v' into the persistent 'p' database table, returning the record number, which is assigned to d1
print db.p[d1].k  # this would print "v"

模型和控制器将进入单独的文件。您可以将web2py仅用于DAL。或者，您也可以使用它的python模板功能来启用您的应用程序Web。

一次读回多个记录时，可以转换db as_dict或as_array。有关详细信息，请查看DAL文档。

持久性Python集，列表和字典？

3 个答案: