我有以下数据框df
import pandas as pd
df = pd.DataFrame([[1, 1, 2, 2, 2, 3,4,5,5,5,6,6,6,6],
list('AABBBCDEEEFFFF'),
[1, 2, 3, 4, 5, 6,7,8,9,10,11,12,13,14],
[1, 2, 3, 4, 5, 6,7,8,9,11,12,11,11,11]]).T
df.columns = ['col1','col2','col3','col4']
df
Out[4]:
col1 col2 col3 col4
0 1 A 1 1
1 1 A 2 2
2 2 B 3 3
3 2 B 4 4
4 2 B 5 5
5 3 C 6 6
6 4 D 7 7
7 5 E 8 8
8 5 E 9 9
9 5 E 10 11
10 6 F 11 12
11 6 F 12 11
12 6 F 13 11
13 6 F 14 11
我按照以下顺序按照列进行分组
df.groupby(['col1','col2','col3']).size()
Out[7]:
col1 col2 col3
1 A 1 1
2 1
2 B 3 1
4 1
5 1
3 C 6 1
4 D 7 1
5 E 8 1
9 1
10 1
6 F 11 1
12 1
13 1
14 1
如何为结果数据帧的每一组提取col3的第一个值?
df_return
Out[4]:
col3
0 1
1 3
2 6
3 7
4 8
5 11
答案 0 :(得分:0)
IIUC drop_duplicates
df.sort_values('col3').drop_duplicates(['col1','col2']).col3
Out[1258]:
0 1
2 3
5 6
6 7
7 8
10 11
Name: col3, dtype: object
或者您可以在groupby
group
size
df.groupby(['col1','col2','col3']).size().groupby(level=[0,1]).head(1)
Out[1260]:
col1 col2 col3
1 A 1 1
2 B 3 1
3 C 6 1
4 D 7 1
5 E 8 1
6 F 11 1
dtype: int64
获取值
df.groupby(['col1','col2','col3']).size().groupby(level=[0,1]).head(1).index.get_level_values(2)
Out[1261]: Int64Index([1, 3, 6, 7, 8, 11], dtype='int64', name='col3')