我有一个像这样的熊猫数据框
from pandas import datetime as date
import pandas as pd
df = pd.DataFrame({'a':['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
'date':[date(2020, 5, 1), date(2020, 6, 1), date(2020, 7, 1),
date(2020, 5, 1), date(2020, 6, 1), date(2020, 7, 1),
date(2020, 5, 1), date(2020, 6, 1), date(2020, 7, 1)],
'c':['abc', 'abc', 'abc', 'def', 'xyz', 'abc',
'abc', 'def', 'def']})
a date c
0 A 2020-05-01 abc
1 A 2020-06-01 abc
2 A 2020-07-01 abc
3 B 2020-05-01 def
4 B 2020-06-01 xyz
5 B 2020-07-01 abc
6 C 2020-05-01 abc
7 C 2020-06-01 def
8 C 2020-07-01 def
我想按列“ a”和“ c”分组,按列“ a”的分组计数剩余行,并显示结果中的所有列。
输出应如下图所示
a date c d
0 A 2020-05-01 abc 1
1 B 2020-05-01 def 3
2 B 2020-06-01 xyz 3
3 B 2020-07-01 abc 3
4 C 2020-05-01 abc 2
5 C 2020-06-01 def 2
答案 0 :(得分:2)
IIUC,您可以drop_duplicates
,然后groupby
并使用每个组的size
进行变换:
out = df.drop_duplicates(['a','c']).copy()
out['d'] = out.groupby(['a']).c.transform('size')
print(out)
a date c d
0 A 2020-05-01 abc 1
3 B 2020-05-01 def 3
4 B 2020-06-01 xyz 3
5 B 2020-07-01 abc 3
6 C 2020-05-01 abc 2
7 C 2020-06-01 def 2