在pandas,numpy或其他数据库中,是否存在更有效的方法来查找和返回局部最大值的索引,这些索引也超过了阈值并以称为local_max的距离分开,如下所示?我认为我的代码正在运行,但不是很干净。
import pandas as pd
import numpy as np
np.random.seed(101)
df = pd.DataFrame(np.random.randn(20,1),columns=['surge'])
surge
0 2.706850
1 0.628133
2 0.907969
3 0.503826
4 0.651118
5 -0.319318
6 -0.848077
7 0.605965
8 -2.018168
9 0.740122
10 0.528813
11 -0.589001
12 0.188695
13 -0.758872
14 -0.933237
15 0.955057
16 0.190794
17 1.978757
18 2.605967
19 0.683509
surge_threshold = .7
local_max = 5 # seperate results exeeding the surge threshold by 5 rows and return the highest local surge value.
# look for a surge
df.dropna(inplace=True)
i = df.first_valid_index()
markers = []
while i + 1 <= df.index[-1]:
if df.loc[i,'surge'] > surge_threshold:
# check if markers is an empty list
if markers:
if (i - markers[-1] < local_max):
if (df.loc[i,'surge'] >= df.loc[markers[-1],'surge']):
markers[-1] = i
else:
markers.append(i)
else:
markers.append(i)
i += 1
print(markers)
此df的结果为[0,9,18],因为它们是局部最大值的索引,这些索引超过.7且至少相隔5行。