如何相对于另一列查找非重复出现的列

时间:2019-06-05 07:43:53

标签: python python-3.x pandas-groupby

>>> df=pd.DataFrame({'order No':[71,71,71,71,71,71,71,72,72,72,72,72,72,72,73,73],'product id':[123,12,123,123,123,15,16,14,112,15,112,112,12,112,100,101],'Category':['product','service','product','product','product','service','service','service','product','service','product','product','service','product','service','service']})
>>> df
    order No  product id Category
0         71         123  product
1         71          12  service
2         71         123  product
3         71         123  product
4         71         123  product
5         71          15  service
6         71          16  service
7         72          14  service
8         72         112  product
9         72          15  service
10        72         112  product
11        72         112  product
12        72          12  service
13        72         112  product
14        73         100  service
15        73         101  service

预期输出:

order No  Category  COunt of product
71        Product   2
72        Product   3

如何根据每个订单编号查找不重复类别=产品的数量


实际上,要求的输出是非重复的'订单号,产品ID,类别(仅针对产品),此处对于订单号71仅考虑索引0和索引2。索引3和4是重复的,因为没有索引3和4之间的新组合,这就是我得到2的方式。同样,对于第72号订单,只有索引8,10和13必须考虑获得计数3

1 个答案:

答案 0 :(得分:0)

因此,您要在提取类别的编号或顺序之前从数据框中过滤与上一行相同的行。

对于第一部分,您可以将数据框与其移位进行比较,并拒绝所有列都相同的行:

print(df.loc[(df.shift()!=df).any(axis=1)])

给予:

    order No  product id Category
0         71         123  product
1         71          12  service
2         71         123  product
5         71          15  service
6         71          16  service
7         72          14  service
8         72         112  product
9         72          15  service
10        72         112  product
12        72          12  service
13        72         112  product
14        73         100  service
15        73         101  service

要仅考虑product类别,只需添加一个条件:

df.loc[(~(df.shift() == df).all(axis=1))&(df.Category=='product')]

给予:

    order No  product id Category
0         71         123  product
2         71         123  product
8         72         112  product
10        72         112  product
13        72         112  product

最后是groupbycount

resul = df.loc[(~(df.shift() == df).all(axis=1))
   &(df.Category=='product')].groupby(['order No', 'Category']).count()

这是预期的:

                   product id
order No Category            
71       product            2
72       product            3