加快嵌套“ For”循环的替代方法(产品推荐)

时间:2019-08-06 10:08:41

标签: python pandas for-loop

我在互联网上看到一个amazing blog(用Python向客户推荐商品)。

我将代码用于实际的用例,但不幸的是,它确实很慢(可能是因为我的数据集包含更多独特产品和更多客户。

它现在运行了超过2天,我想知道:是否可以更高效地编写此代码? (更快的运行时间),还是嵌套for循环是Python中最快的方法?

示例数据:

UserId      ItemId

1           Babyphone
1           Babyphone
1           CoffeeMachine
2           CoffeeMachine
2           Shaver
3           Shaver
3           CoffeeMachine
4           CoffeeMachine
4           Shaver
4           Blender
5           Blender
5           BabyPhone
5           Shaver
6           Shaver
7           CoffeeMachine
7           CoffeeMachine
8           BabyPhone
9           Blender
9           Blender

代码:

import pandas as pd

#userItemData = pd.read_csv('example_data.csv')
userItemData.head()

#Get list of unique items
itemList=list(set(userItemData["ItemId"].tolist()))

#Get count of users
userCount=len(set(userItemData["UserId"].tolist()))

#Create an empty data frame to store item affinity scores for items.
itemAffinity= pd.DataFrame(columns=('item1', 'item2', 'score'))
rowCount=0

#For each item in the list, compare with other items.
for ind1 in range(len(itemList)):

    #Get list of users who bought this item 1.
    item1Users = userItemData[userItemData.ItemId==itemList[ind1]]["userId"].tolist()
    #print("Item 1 ", item1Users)

    #Get item 2 - items that are not item 1 or those that are not analyzed already.
    for ind2 in range(ind1, len(itemList)):

        if ( ind1 == ind2):
            continue

        #Get list of users who bought item 2
        item2Users=userItemData[userItemData.ItemId==itemList[ind2]]["userId"].tolist()
        #print("Item 2",item2Users)

        #Find score. Find the common list of users and divide it by the total users.
        commonUsers= len(set(item1Users).intersection(set(item2Users)))
        score=commonUsers / userCount

        #Add a score for item 1, item 2
        itemAffinity.loc[rowCount] = [itemList[ind1],itemList[ind2],score]
        rowCount +=1
        #Add a score for item2, item 1. The same score would apply irrespective of the sequence.
        itemAffinity.loc[rowCount] = [itemList[ind2],itemList[ind1],score]
        rowCount +=1

#Check final result
itemAffinity.head()

非常感谢!

0 个答案:

没有答案