Question

我有一个数据集，其中确定满足特定条件的值用于执行概率计算，作为求和的一部分。目前，我将数据保存在嵌套字典中以简化确定性处理的过程。

我使用的算法证明非常昂贵，并且在一段时间后压倒了内存。

处理的psudocode如下：

for businessA in business : # iterate over 77039 values 
    for businessB in business : # iterate over 77039 values
        if businessA != businessB :
            for rating in business[businessB] : # where rating is 1 - 5
                for review in business[businessB][rating] :
                    user = reviewMap[review]['user'];
                    if user in business[businessA]['users'] :
                        for users in business[businessA]['users'] :
                            # do something
                # do probability
                # a print is here

如何更有效地编写上述内容以保持每个业务的准确概率总和？

EDIT 包括源代码 - 此处，businessA和businessB属于单独的词典，但值得注意的是，它们在每个词典中都包含相同的businessID（出价）。它只是每个键值的变化：值对。

def crossMatch(TbidMap) :
    for Tbid in TbidMap :
        for Lbid in LbidMap :
            # Ensure T and L aren't the same business
            if Tbid != Lbid :
                # Get numer of reviews at EACH STAR rate for L
                for stars in LbidMap[Lbid] :
                    posTbid = 0;
                    # For each review check if user rated the Tbid
                    for Lreview in LbidMap[Lbid][stars] :
                        user = reviewMap[Lreview]['user'];
                        if user in TbidMap[Tbid] :
                            # user rev'd Tbid, get their Trid & see if gave Tbid pos rev
                            for Trid in TbidMap[Tbid][user] :
                                Tstar = reviewMap[Trid]['stars'];
                                if Tstar in pos_list :
                                    posTbid += 1;
                    #probability calculations happen here

Answer 1

您的数据集中有超过50亿个公司组合，这实际上会强调内存。我认为你将所有结果存储在内存中;相反，我会对数据库进行临时转储并释放容器。这是该方法的草图，因为我没有可以测试的真实数据，并且在您遇到它们时可能更容易回应您的困难。理想情况下，嵌套列表会有一个临时容器，以便您可以使用executemany，但这是如此大量嵌套的缩写名称，没有测试数据很难遵循。

import sqlite3

def create_interim_mem_dump(cursor, connection):

    query = """CREATE TABLE IF NOT EXISTS ratings(
            Tbid TEXT,
            Lbid TEXT,
            posTbid TEXT)
            """
    cursor.execute(query)
    connection.commit()


def crossMatch(TbidMap, cursor, connection) :
    for Tbid in TbidMap :
        for Lbid in LbidMap :
            # Ensure T and L aren't the same business
            if Tbid != Lbid :
                # Get numer of reviews at EACH STAR rate for L
                for stars in LbidMap[Lbid] :
                    posTbid = 0;
                    # For each review check if user rated the Tbid
                    for Lreview in LbidMap[Lbid][stars] :
                        user = reviewMap[Lreview]['user'];
                        if user in TbidMap[Tbid] :
                            # user rev'd Tbid, get their Trid & see if gave Tbid pos rev
                            for Trid in TbidMap[Tbid][user] :
                                Tstar = reviewMap[Trid]['stars'];
                                if Tstar in pos_list :
                                    posTbid += 1;   
                    query = """INSERT INTO ratings (Tbid, Lbid, posTbid) 
                            VALUES (?, ?, ?)"""
                    cursor.execute(query, (Tbid, Lbid, posTbid))
        connection.commit()



if __name__ == '__main__':
    conn = sqlite3.connect('collated_ratings.db')
    c = conn.cursor()

    create_db = create_interim_mem_dump(c, conn)
    your_data = 'Some kind of dictionary into crossMatch()'
    c.close()
    conn.close()

通过嵌套的python词典

1 个答案: