如何获得Pandas groupby中的独特计数

时间:2017-03-31 09:39:52

标签: python pandas

我想获得每个order_number的独特产品数量。我设法获得了total_product计数(感谢另一个SO用户的帮助),但是我无法弄清楚它的独特性。

这就是我所拥有的:

data['total_productcount'] = data.groupby(['order_number'])['order_number'].transform('size')

它给出了:

order_number          product_id      total_productcount   
171-1046037-0511522   4260179734731   5                    
171-1046037-0511522   4054673034394   5                   
171-1046037-0511522   4054673001235   5                   
171-1046037-0511522   4054673005752   5                    
171-1046037-0511522   5011385960075   5                    
171-1046037-0511522   5011385960075   5    

这是我想要生成的数据帧(包括:distict_productcount)

order_number          product_id      total_productcount   distict_productcount
171-1046037-0511522   4260179734731   5                    1
171-1046037-0511522   4054673034394   5                    1
171-1046037-0511522   4054673001235   5                    1
171-1046037-0511522   4054673005752   5                    1
171-1046037-0511522   5011385960075   5                    1
171-1046037-0511522   5011385960075   5                    2

如何生成“distict_productcount”?

1 个答案:

答案 0 :(得分:2)

data.groupby('order_number').product_id.nunique()

您可以使用transformjoin

获取新列

通过transform

s = data.groupby('order_number').product_id.transform('nunique')
df = df.assign(distinct_productcount=s)

通过join

s = data.groupby('order_number').product_id.nunique()
df = df.join(s.rename('distinct_productcount'), on='order_number')