>>> df=pd.DataFrame({'order No':[71,71,71,71,71,71,71,72,72,72,72,72,72,72,73,73],'product id':[123,12,123,123,123,15,16,14,112,15,112,112,12,112,100,101],'Category':['product','service','product','product','product','service','service','service','product','service','product','product','service','product','service','service']})
>>> df
order No product id Category
0 71 123 product
1 71 12 service
2 71 123 product
3 71 123 product
4 71 123 product
5 71 15 service
6 71 16 service
7 72 14 service
8 72 112 product
9 72 15 service
10 72 112 product
11 72 112 product
12 72 12 service
13 72 112 product
14 73 100 service
15 73 101 service
预期输出:
order No Category COunt of product
71 Product 2
72 Product 3
如何根据每个订单编号查找不重复类别=产品的数量
实际上,要求的输出是非重复的'订单号,产品ID,类别(仅针对产品),此处对于订单号71仅考虑索引0和索引2。索引3和4是重复的,因为没有索引3和4之间的新组合,这就是我得到2的方式。同样,对于第72号订单,只有索引8,10和13必须考虑获得计数3
答案 0 :(得分:0)
因此,您要在提取类别的编号或顺序之前从数据框中过滤与上一行相同的行。
对于第一部分,您可以将数据框与其移位进行比较,并拒绝所有列都相同的行:
print(df.loc[(df.shift()!=df).any(axis=1)])
给予:
order No product id Category
0 71 123 product
1 71 12 service
2 71 123 product
5 71 15 service
6 71 16 service
7 72 14 service
8 72 112 product
9 72 15 service
10 72 112 product
12 72 12 service
13 72 112 product
14 73 100 service
15 73 101 service
要仅考虑product
类别,只需添加一个条件:
df.loc[(~(df.shift() == df).all(axis=1))&(df.Category=='product')]
给予:
order No product id Category
0 71 123 product
2 71 123 product
8 72 112 product
10 72 112 product
13 72 112 product
最后是groupby
和count
:
resul = df.loc[(~(df.shift() == df).all(axis=1))
&(df.Category=='product')].groupby(['order No', 'Category']).count()
这是预期的:
product id
order No Category
71 product 2
72 product 3