熊猫:按2列分组,使所有行具有唯一值

时间:2020-07-01 18:26:07

标签: python pandas pandas-groupby

我有一个像这样的熊猫数据框

from pandas import datetime as date
import pandas as pd

df = pd.DataFrame({'a':['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'], 
                  'date':[date(2020, 5, 1), date(2020, 6, 1), date(2020, 7, 1), 
                          date(2020, 5, 1), date(2020, 6, 1), date(2020, 7, 1), 
                          date(2020, 5, 1), date(2020, 6, 1), date(2020, 7, 1)], 
                  'c':['abc', 'abc', 'abc', 'def', 'xyz', 'abc', 
                       'abc', 'def', 'def']})

   a        date    c
0  A  2020-05-01  abc
1  A  2020-06-01  abc
2  A  2020-07-01  abc
3  B  2020-05-01  def
4  B  2020-06-01  xyz
5  B  2020-07-01  abc
6  C  2020-05-01  abc
7  C  2020-06-01  def
8  C  2020-07-01  def

我想按列“ a”和“ c”分组,按列“ a”的分组计数剩余行,并显示结果中的所有列。

输出应如下图所示

   a        date    c    d
0  A  2020-05-01  abc    1
1  B  2020-05-01  def    3
2  B  2020-06-01  xyz    3
3  B  2020-07-01  abc    3
4  C  2020-05-01  abc    2
5  C  2020-06-01  def    2

1 个答案:

答案 0 :(得分:2)

IIUC,您可以drop_duplicates,然后groupby并使用每个组的size进行变换:

out = df.drop_duplicates(['a','c']).copy()
out['d'] = out.groupby(['a']).c.transform('size')

print(out)

   a       date    c  d
0  A 2020-05-01  abc  1
3  B 2020-05-01  def  3
4  B 2020-06-01  xyz  3
5  B 2020-07-01  abc  3
6  C 2020-05-01  abc  2
7  C 2020-06-01  def  2