Question

我是由pickle.dump（）生成一个大小约为5GB的文件。加载此文件大约需要半天时间，大约需要50GM RAM。我的问题是，是否可以通过单独访问（一次一个）而不是将其全部加载到内存中来读取此文件，或者如果您对如何访问此类文件中的数据有任何其他建议。

非常感谢。

Answer 1

毫无疑问，这应该是使用数据库完成的，而不是pickle-数据库是针对这类问题而设计的。

这里有一些代码可以帮助您入门，它将字典放入sqllite数据库并显示检索值的示例。为了让它与您的实际字典而不是我的玩具示例一起使用，您需要了解有关SQL的更多信息，但幸运的是，在线提供了许多优秀的资源。特别是，您可能想学习如何使用SQLAlchemy，这是一个“对象关系映射器”，可以使数据库与使用对象一样直观。

import os
import sqlite3

# an enormous dictionary too big to be stored in pickle
my_huge_dictionary = {"A": 1, "B": 2, "C": 3, "D": 4}

# create a database in the file my.db
conn = sqlite3.connect('my.db')
c = conn.cursor()

# Create table with two columns: k and v (for key and value). Here your key
# is assumed to be a string of length 10 or less, and your value is assumed
# to be an integer. I'm sure this is NOT the structure of your dictionary;
# you'll have to read into SQL data types
c.execute("""
create table dictionary (
k char[10] NOT NULL,
v integer NOT NULL,
PRIMARY KEY (k))
""")

# dump your enormous dictionary into a database. This will take a while for
# your large dictionary, but you should do it only once, and then in the future
# make changes to your database rather than to a pickled file.
for k, v in my_huge_dictionary.items():
    c.execute("insert into dictionary VALUES ('%s', %d)" % (k, v))

# retrieve a value from the database
my_key = "A"
c.execute("select v from dictionary where k == '%s'" % my_key)
my_value = c.next()[0]
print my_value

祝你好运！

Answer 2

你可以尝试一个面向对象的数据库，如果你的数据是异构的 - 使用ZODB - 它在内部使用pickle，但是以一种设计和时间证明 - 来管理大量数据，你可能只需要很少的改变应用

ZODB是Zope的核心 - 一个Python应用服务器 - 它现在可以在其他应用程序中为Plone提供支持。

它可以单独使用，不需要Zope的所有工具 - 如果您的数据不适合SQL，您应该检查它。

http://www.zodb.org/

加载一个巨大的Python Pickle字典

2 个答案: