Pandas groupby排序获得前两个最小值的行

时间:2017-02-18 05:54:27

标签: python pandas grouping

我想分组为df [" A"]并导出df [" B"]中的值,这些值对应于df中的前两个最小值[" C& #34;]

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',
                     'foo', 'bar', 'foo', 'foo'],
                   'B': ['cat', 'dog', 'rat', 'lion',
                     'bat', 'racoon', 'possum', 'deer'],
                   'C': [1, 2, 6, 4, 3, 1, 2, 4]})

我希望结果是:

   A    B_1     B_2
0  foo  cat     possum
1  bar  racoon  dog

1 个答案:

答案 0 :(得分:2)

我认为你需要:

df1 = df.set_index('B')
        .groupby('A', sort=False)['C']
        .apply(lambda x: pd.Series(x.nsmallest(2).index))
        .unstack()
df1.columns = df1.columns + 1
df1 = df1.add_prefix('B_').reset_index()
print (df1)
     A     B_1     B_2
0  foo     cat  possum
1  bar  racoon     dog

一行解决方案:

df1 = df.set_index('B')
        .groupby('A', sort=False)['C']
        .apply(lambda x: pd.Series(x.nsmallest(2).index, index =['B_1','B_2']))
        .unstack()
        .reset_index()
print (df1)
     A     B_1     B_2
0  foo     cat  possum
1  bar  racoon     dog

编辑:

它也适用于datetime完美:

np.random.seed(100)
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',
                     'foo', 'bar', 'foo', 'foo'],
                   'B': ['cat', 'dog', 'rat', 'lion',
                     'bat', 'racoon', 'possum', 'deer'],
                   'C': np.random.choice(pd.date_range('2017-02-18', 
                                                       periods=8), 
                                         size=8, replace=False)})
print (df)
     A       B          C
0  foo     cat 2017-02-19
1  bar     dog 2017-02-22
2  foo     rat 2017-02-23
3  bar    lion 2017-02-20
4  foo     bat 2017-02-24
5  bar  racoon 2017-02-21
6  foo  possum 2017-02-25
7  foo    deer 2017-02-18

print (df.dtypes)
A            object
B            object
C    datetime64[ns]

df1 = df.set_index('B')
        .groupby('A', sort=False)['C']
        .apply(lambda x: pd.Series(x.nsmallest(2).index, index =['B_1','B_2']))
        .unstack()
        .reset_index()
print (df1)
     A   B_1     B_2
0  foo  deer     cat
1  bar  lion  racoon