Question

如何在系列或列中找到元素的索引（例如“True”）？

例如，我有一个列，我想要识别事件发生的第一个实例。所以我把它写成

Variable = df["Force"] < event

然后创建一个Boolen系列数据，它是False，直到第一个实例变为True。那怎么找到数据点索引呢？

有更好的方法吗？

Answer 1

使用idxmax查找最大值的第一个实例。在这种情况下，True是最大值。

df['Force'].lt(event).idxmax()

考虑样本df：

df = pd.DataFrame(dict(Force=[5, 4, 3, 2, 1]), list('abcde'))
df

   Force
a      5
b      4
c      3
d      2
e      1

Force小于3的第一个实例位于索引'd'。

df['Force'].lt(3).idxmax()
'd'

请注意，如果Force的值不小于3，则最大值为False，第一个实例将是第一个。

另请考虑替代argmax

df.Force.lt(3).values.argmax()
3

它返回最大值的第一个实例的位置。然后，您可以使用它来查找相应的index值：

df.index[df.Force.lt(3).values.argmax()]
'd'

此外，将来argmax将是一个系列方法。

Answer 2

您也可以first_valid_index尝试where。

df = pd.DataFrame([[5], [4], [3], [2], [1]], columns=["Force"])
df.Force.where(df.Force < 3).first_valid_index()
3

where默认情况下会将np.nan 替换为不符合条件的部分。然后，我们找到该系列中的第一个有效索引。

或者：选择您感兴趣的项目的子集，此处为Variable == 1。然后找到其索引中的第一个项目。

df = pd.DataFrame([[5], [4], [3], [2], [1]], columns=["Force"])
v = (df["Force"] < 3)
v[v == 1].index[0]

奖励：如果您需要多种项目的首次出现索引，可以使用drop_duplicates。

df = pd.DataFrame([["yello"], ["yello"], ["blue"], ["red"],  ["blue"], ["red"]], columns=["Force"])  
df.Force.drop_duplicates().reset_index()
    index   Force
0   0       yello
1   2       blue
2   3       red

还有一些工作......

df.Force.drop_duplicates().reset_index().set_index("Force").to_dict()["index"]
{'blue': 2, 'red': 3, 'yello': 0}

Answer 3

以下是我发现很容易适应的非熊猫解决方案：

import pandas as pd

df = pd.DataFrame(dict(Force=[5, 4, 3, 2, 1]), list('abcde'))

next(idx for idx, x in zip(df.index, df.Force) if x < 3)  # d

它通过迭代到生成器表达式的第一个结果来工作。

Pandas相比表现不佳：

df = pd.DataFrame(dict(Force=np.random.randint(0, 100000, 100000)))

n = 99900

%timeit df['Force'].lt(n).idxmin()
# 1000 loops, best of 3: 1.57 ms per loop

%timeit df.Force.where(df.Force > n).first_valid_index()
# 100 loops, best of 3: 1.61 ms per loop

%timeit next(idx for idx, x in zip(df.index, df.Force) if x > n)
# 10000 loops, best of 3: 100 µs per loop

Answer 4

这是一个全熊猫解决方案，我认为它比其他一些答案更简洁。它还能够处理输入序列的值都不满足条件的极端情况。

def first_index_ordered(mask):
    assert mask.index.is_monotonic_increasing
    assert mask.dtype == bool
    idx_min = mask[mask].index.min()
    return None if pd.isna(idx_min) else idx_min

col = "foo"
thr = 42
mask = df[col] < thr
idx_first = first_index_ordered(mask)

以上假设 mask 有一个值有序、单调递增的索引。如果不是这种情况，我们必须做更多的事情：

def first_index_unordered(mask):
    assert mask.dtype == bool
    index = mask.index
    # This creates a RangeIndex, which is monotonic
    mask = mask.reset_index(drop=True)
    idx_min = mask[mask].index.min()
    return None if pd.isna(idx_min) else index[idx_min]

当然，我们可以将这两种情况合并在一个函数中：

def first_index_where(mask):
    if mask.index.is_monotonic_increasing:
        return first_index_ordered(mask)
    else:
        return first_index_unordered(mask)

从系列/列

4 个答案: