Question

在Pandas.DataFrame中，我想找到其索引在给定列中最接近（但低于）指定值的行的索引。具体来说，假设我得到了数字40和DataFrame df：

|    |   x |
|---:|----:|
|  0 |  11 |
|  1 |  15 |
|  2 |  17 |
|  3 |  25 |
|  4 |  54 |

我想找到该行的索引，以使df [“ x”]较低，但尽可能接近40。在这里，答案是3，因为df [3，'x'] = 25较小比给定数字40但最接近它。我的数据框还有其他列，但是我可以假设“ x”列正在增加。

我完全匹配了（如果有更好的方法请更正我）：

    list = df[(df.x == number)].index.tolist()
    if list:
        result = list[0]

但是对于一般情况，我不知道如何以“向量化”方式进行。

Answer 1

用Series.lt中的boolean indexing过滤40行以下的行，并用Series.idxmax获得最接近的索引值：

a = df.loc[df['x'].lt(40), 'x'].idxmax()
print (a)
3

为了提高性能，可以将numpy.where与np.max一起使用，如果默认索引为，则解决方案可以工作：

a = np.max(np.where(df['x'].lt(40))[0])
print (a)
3

如果不是默认的RangeIndex：

df = pd.DataFrame({'x':[11,15,17,25,54]}, index=list('abcde'))

a = np.max(np.where(df['x'].lt(40))[0])
print (a)
3

print (df.index[a])
d

Answer 2

那呢：

import pandas as pd

data  = {'x':[0,1,2,3,4,20,50]}

df = pd.DataFrame(data)

#get df with selected condition
sub_df = df[df['x'] <= 40]

#get the idx of the maximum
idx = sub_df.idxmax()

print(idx)

Answer 3

使用Series.where到mask大于或等于n，然后使用Series.idxmax获得最接近的一个：

n=40
val = df['x'].where(df['x'].lt(n)).idxmax()
print(val)
3

我们也可以使用Series.mask：

df['x'].mask(df['x'].ge(40)).idxmax()

或callable与loc[]

df['x'].loc[lambda x: x.lt(40)].idxmax()
#alternative
#df.loc[lambda col: col['x'].lt(40),'x'].idxmax()

如果不是默认RangeIndex

i = df.loc[lambda col: col['x'].lt(40),'x'].reset_index(drop=True).idxmax()
df.index[i]

Pandas.DataFrame：查找其行在给定列中的值最接近（但低于）指定值的索引

3 个答案: