我有一个CSV,其中包含我想要迭代的销售数据,以列出同一买家经常购买的类别。我想我可以用字典和a script like this来做到这一点但是我无法概念化如何计算同一买家在不同类别中出现的次数。
CSV数据样本:
buyer_id | order_id | category
1, 10, shoes
1, 11, outerwear
2, 12, beauty
2, 13, shoes
2, 14, outerwear
在那个样本中,我想知道鞋子,外套至少是两次组合。
答案 0 :(得分:1)
import pandas as pd
#Creating dataframe
data = pd.DataFrame(
{'Buyer_ID': [1,1,2,2,2,1],
'Order_ID': [10,11,12,13,14,15],
'Category':['shoes','outerwear','beauty','shoes','outerwear','shoes']
})
data
Out[]:
Buyer_ID Category Order_ID
0 1 shoes 10
1 1 outerwear 11
2 2 beauty 12
3 2 shoes 13
4 2 outerwear 14
5 1 shoes 15
# Output: Same buyer and unique categories
data.groupby(["Buyer_ID", "Category"]).size()
# Buyer_ID:1 with two shoes entry is displayed only once (hence only unique categories are considered).
Out[]:
Buyer_ID Category
1 outerwear 1
shoes 2
2 beauty 1
outerwear 1
shoes 1
dtype: int64