Question

在pandas，numpy或其他数据库中，是否存在更有效的方法来查找和返回局部最大值的索引，这些索引也超过了阈值并以称为local_max的距离分开，如下所示？我认为我的代码正在运行，但不是很干净。

import pandas as pd
import numpy as np

np.random.seed(101)
df = pd.DataFrame(np.random.randn(20,1),columns=['surge'])

    surge
0   2.706850
1   0.628133
2   0.907969
3   0.503826
4   0.651118
5   -0.319318
6   -0.848077
7   0.605965
8   -2.018168
9   0.740122
10  0.528813
11  -0.589001
12  0.188695
13  -0.758872
14  -0.933237
15  0.955057
16  0.190794
17  1.978757
18  2.605967
19  0.683509

surge_threshold = .7
local_max = 5 # seperate results exeeding the surge threshold by 5 rows and return the highest local surge value.

# look for a surge
df.dropna(inplace=True)
i = df.first_valid_index()
markers = []
while i + 1 <= df.index[-1]:
    if df.loc[i,'surge'] > surge_threshold:
        # check if markers is an empty list
        if markers:
            if (i - markers[-1] < local_max):
                if (df.loc[i,'surge'] >= df.loc[markers[-1],'surge']):
                    markers[-1] = i
            else:
                markers.append(i)
        else:
            markers.append(i)
    i += 1
print(markers)

此df的结果为[0,9,18]，因为它们是局部最大值的索引，这些索引超过.7且至少相隔5行。

熊猫列中的多个局部最大值超过阈值

0 个答案: