pandas df locate只保留第一项

时间:2017-01-17 06:58:16

标签: python pandas

我想根据同一行中某个列中的值获取另一列的值。

示例:

对于商家ID =' 123',我想检索business_name

DF:

biz_id  biz_name
123      chew
456      bite
123      chew

代码:

df['biz_name'].loc[df['biz_id'] == 123]

给我回复:

chew
chew

如何以字符串格式获得'chew'的1个值?

2 个答案:

答案 0 :(得分:2)

使用idxmax获取第一个最大值的索引

df.loc[df.biz_id.eq(123).idxmax(), 'biz_name']

'chew'

答案 1 :(得分:1)

您可以使用ilociat来选择Series的第一个值:

print (df.loc[df['biz_id'] == 123, 'biz_name'].iloc[0])
chew

或者:

print (df.loc[df['biz_id'] == 123, 'biz_name'].iat[0])
chew

使用query

print (df.query('biz_id == 123')['biz_name'].iloc[0])
chew

或者在listnumpy array中选择第一个值:

print (df.loc[df['biz_id'] == 123, 'biz_name'].tolist()[0])
chew

print (df.loc[df['biz_id'] == 123, 'biz_name'].values[0])
chew

<强>计时

In [18]: %timeit (df.loc[df['biz_id'] == 123, 'biz_name'].iloc[0])
1000 loops, best of 3: 399 µs per loop

In [19]: %timeit (df.loc[df['biz_id'] == 123, 'biz_name'].iat[0])
The slowest run took 4.16 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 391 µs per loop

In [20]: %timeit (df.query('biz_id == 123')['biz_name'].iloc[0])
The slowest run took 4.39 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 1.75 ms per loop

In [21]: %timeit (df.loc[df['biz_id'] == 123, 'biz_name'].tolist()[0])
The slowest run took 4.18 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 384 µs per loop

In [22]: %timeit (df.loc[df['biz_id'] == 123, 'biz_name'].values[0])
The slowest run took 5.32 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 370 µs per loop

In [23]: %timeit (df.loc[df.biz_id.eq(123).idxmax(), 'biz_name'])
1000 loops, best of 3: 517 µs per loop