我想获得每个order_number的独特产品数量。我设法获得了total_product计数(感谢另一个SO用户的帮助),但是我无法弄清楚它的独特性。
这就是我所拥有的:
data['total_productcount'] = data.groupby(['order_number'])['order_number'].transform('size')
它给出了:
order_number product_id total_productcount
171-1046037-0511522 4260179734731 5
171-1046037-0511522 4054673034394 5
171-1046037-0511522 4054673001235 5
171-1046037-0511522 4054673005752 5
171-1046037-0511522 5011385960075 5
171-1046037-0511522 5011385960075 5
这是我想要生成的数据帧(包括:distict_productcount)
order_number product_id total_productcount distict_productcount
171-1046037-0511522 4260179734731 5 1
171-1046037-0511522 4054673034394 5 1
171-1046037-0511522 4054673001235 5 1
171-1046037-0511522 4054673005752 5 1
171-1046037-0511522 5011385960075 5 1
171-1046037-0511522 5011385960075 5 2
如何生成“distict_productcount”?
答案 0 :(得分:2)
data.groupby('order_number').product_id.nunique()
您可以使用transform
或join
通过transform
s = data.groupby('order_number').product_id.transform('nunique')
df = df.assign(distinct_productcount=s)
通过join
s = data.groupby('order_number').product_id.nunique()
df = df.join(s.rename('distinct_productcount'), on='order_number')