我想分组为df [" A"]并导出df [" B"]中的值,这些值对应于df中的前两个最小值[" C& #34;]
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',
'foo', 'bar', 'foo', 'foo'],
'B': ['cat', 'dog', 'rat', 'lion',
'bat', 'racoon', 'possum', 'deer'],
'C': [1, 2, 6, 4, 3, 1, 2, 4]})
我希望结果是:
A B_1 B_2
0 foo cat possum
1 bar racoon dog
答案 0 :(得分:2)
我认为你需要:
groupby
与nsmallest
unstack
1
添加到列名称add_prefix
上次和reset_index
df1 = df.set_index('B')
.groupby('A', sort=False)['C']
.apply(lambda x: pd.Series(x.nsmallest(2).index))
.unstack()
df1.columns = df1.columns + 1
df1 = df1.add_prefix('B_').reset_index()
print (df1)
A B_1 B_2
0 foo cat possum
1 bar racoon dog
一行解决方案:
df1 = df.set_index('B')
.groupby('A', sort=False)['C']
.apply(lambda x: pd.Series(x.nsmallest(2).index, index =['B_1','B_2']))
.unstack()
.reset_index()
print (df1)
A B_1 B_2
0 foo cat possum
1 bar racoon dog
编辑:
它也适用于datetime
完美:
np.random.seed(100)
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',
'foo', 'bar', 'foo', 'foo'],
'B': ['cat', 'dog', 'rat', 'lion',
'bat', 'racoon', 'possum', 'deer'],
'C': np.random.choice(pd.date_range('2017-02-18',
periods=8),
size=8, replace=False)})
print (df)
A B C
0 foo cat 2017-02-19
1 bar dog 2017-02-22
2 foo rat 2017-02-23
3 bar lion 2017-02-20
4 foo bat 2017-02-24
5 bar racoon 2017-02-21
6 foo possum 2017-02-25
7 foo deer 2017-02-18
print (df.dtypes)
A object
B object
C datetime64[ns]
df1 = df.set_index('B')
.groupby('A', sort=False)['C']
.apply(lambda x: pd.Series(x.nsmallest(2).index, index =['B_1','B_2']))
.unstack()
.reset_index()
print (df1)
A B_1 B_2
0 foo deer cat
1 bar lion racoon