我有一个数据集,其中确定满足特定条件的值用于执行概率计算,作为求和的一部分。目前,我将数据保存在嵌套字典中以简化确定性处理的过程。
我使用的算法证明非常昂贵,并且在一段时间后压倒了内存。
处理的psudocode如下:
for businessA in business : # iterate over 77039 values
for businessB in business : # iterate over 77039 values
if businessA != businessB :
for rating in business[businessB] : # where rating is 1 - 5
for review in business[businessB][rating] :
user = reviewMap[review]['user'];
if user in business[businessA]['users'] :
for users in business[businessA]['users'] :
# do something
# do probability
# a print is here
如何更有效地编写上述内容以保持每个业务的准确概率总和?
EDIT 包括源代码 - 此处,businessA和businessB属于单独的词典,但值得注意的是,它们在每个词典中都包含相同的businessID(出价)。它只是每个键值的变化:值对。
def crossMatch(TbidMap) :
for Tbid in TbidMap :
for Lbid in LbidMap :
# Ensure T and L aren't the same business
if Tbid != Lbid :
# Get numer of reviews at EACH STAR rate for L
for stars in LbidMap[Lbid] :
posTbid = 0;
# For each review check if user rated the Tbid
for Lreview in LbidMap[Lbid][stars] :
user = reviewMap[Lreview]['user'];
if user in TbidMap[Tbid] :
# user rev'd Tbid, get their Trid & see if gave Tbid pos rev
for Trid in TbidMap[Tbid][user] :
Tstar = reviewMap[Trid]['stars'];
if Tstar in pos_list :
posTbid += 1;
#probability calculations happen here
答案 0 :(得分:2)
您的数据集中有超过50亿个公司组合,这实际上会强调内存。我认为你将所有结果存储在内存中;相反,我会对数据库进行临时转储并释放容器。这是该方法的草图,因为我没有可以测试的真实数据,并且在您遇到它们时可能更容易回应您的困难。理想情况下,嵌套列表会有一个临时容器,以便您可以使用executemany
,但这是如此大量嵌套的缩写名称,没有测试数据很难遵循。
import sqlite3
def create_interim_mem_dump(cursor, connection):
query = """CREATE TABLE IF NOT EXISTS ratings(
Tbid TEXT,
Lbid TEXT,
posTbid TEXT)
"""
cursor.execute(query)
connection.commit()
def crossMatch(TbidMap, cursor, connection) :
for Tbid in TbidMap :
for Lbid in LbidMap :
# Ensure T and L aren't the same business
if Tbid != Lbid :
# Get numer of reviews at EACH STAR rate for L
for stars in LbidMap[Lbid] :
posTbid = 0;
# For each review check if user rated the Tbid
for Lreview in LbidMap[Lbid][stars] :
user = reviewMap[Lreview]['user'];
if user in TbidMap[Tbid] :
# user rev'd Tbid, get their Trid & see if gave Tbid pos rev
for Trid in TbidMap[Tbid][user] :
Tstar = reviewMap[Trid]['stars'];
if Tstar in pos_list :
posTbid += 1;
query = """INSERT INTO ratings (Tbid, Lbid, posTbid)
VALUES (?, ?, ?)"""
cursor.execute(query, (Tbid, Lbid, posTbid))
connection.commit()
if __name__ == '__main__':
conn = sqlite3.connect('collated_ratings.db')
c = conn.cursor()
create_db = create_interim_mem_dump(c, conn)
your_data = 'Some kind of dictionary into crossMatch()'
c.close()
conn.close()