我在互联网上看到一个amazing blog(用Python向客户推荐商品)。
我将代码用于实际的用例,但不幸的是,它确实很慢(可能是因为我的数据集包含更多独特产品和更多客户。
它现在运行了超过2天,我想知道:是否可以更高效地编写此代码? (更快的运行时间),还是嵌套for循环是Python中最快的方法?
示例数据:
UserId ItemId
1 Babyphone
1 Babyphone
1 CoffeeMachine
2 CoffeeMachine
2 Shaver
3 Shaver
3 CoffeeMachine
4 CoffeeMachine
4 Shaver
4 Blender
5 Blender
5 BabyPhone
5 Shaver
6 Shaver
7 CoffeeMachine
7 CoffeeMachine
8 BabyPhone
9 Blender
9 Blender
代码:
import pandas as pd
#userItemData = pd.read_csv('example_data.csv')
userItemData.head()
#Get list of unique items
itemList=list(set(userItemData["ItemId"].tolist()))
#Get count of users
userCount=len(set(userItemData["UserId"].tolist()))
#Create an empty data frame to store item affinity scores for items.
itemAffinity= pd.DataFrame(columns=('item1', 'item2', 'score'))
rowCount=0
#For each item in the list, compare with other items.
for ind1 in range(len(itemList)):
#Get list of users who bought this item 1.
item1Users = userItemData[userItemData.ItemId==itemList[ind1]]["userId"].tolist()
#print("Item 1 ", item1Users)
#Get item 2 - items that are not item 1 or those that are not analyzed already.
for ind2 in range(ind1, len(itemList)):
if ( ind1 == ind2):
continue
#Get list of users who bought item 2
item2Users=userItemData[userItemData.ItemId==itemList[ind2]]["userId"].tolist()
#print("Item 2",item2Users)
#Find score. Find the common list of users and divide it by the total users.
commonUsers= len(set(item1Users).intersection(set(item2Users)))
score=commonUsers / userCount
#Add a score for item 1, item 2
itemAffinity.loc[rowCount] = [itemList[ind1],itemList[ind2],score]
rowCount +=1
#Add a score for item2, item 1. The same score would apply irrespective of the sequence.
itemAffinity.loc[rowCount] = [itemList[ind2],itemList[ind1],score]
rowCount +=1
#Check final result
itemAffinity.head()
非常感谢!