给出以下数据框:
pd.DataFrame([['a', 1], ['b', 3], ['c', 7], ['d', 4], ['e', 1], ['f', 2], ['g', 9], ['h', 4], ['i', 0]])
有没有更好的方法来找到最后一个局部最大值('g'),而不是反向逐行迭代并搜索大于当前值的前一行?
这是我目前正在使用的东西,必须有一些更有效的东西:
df.columns = ['x', 'y']
first = True
prev_val = None
prev_row = None
for index, row in df[::-1].iterrows():
if first:
prev_val = row['y']
prev_row = row['x']
first = False
else:
if row['y'] >= prev_val:
prev_val = row['y']
prev_row = row['x']
else:
break
答案 0 :(得分:2)
通常,您应该尝试避免手动循环,尤其是对于iterrows
。 numba
是一个例外,它在较低的级别上有效地执行了迭代:
from numba import jit
df = pd.DataFrame([['a', 1], ['b', 3], ['c', 9], ['d', 4], ['e', 1],
['f', 2], ['g', 7], ['h', 4], ['i', 0]])
@jit(nopython=True)
def local_max_idx(A):
for i in range(1, len(A)):
if A[-(i+1)] < A[-i]:
return -i
res = df[0].iat[local_max_idx(df[1].values)] # 'g'
性能基准化
n = 1000000
df = pd.Series([0] + list(range(n, 0, -1))).to_frame().reset_index()
df.columns = [0, 1]
@jit(nopython=True)
def local_max_idx(A):
for i in range(1, len(A)):
if A[-(i+1)] < A[-i]:
return -i
from scipy.signal import argrelextrema
%timeit df.iat[argrelextrema(df[1].values, np.greater)[0][-1], 0] # 46.1 ms per loop
%timeit df[0].iat[local_max_idx(df[1].values)] # 1.59 ms per loop
答案 1 :(得分:1)
from scipy.signal import argrelextrema
a = df.iat[argrelextrema(df[1].values, np.greater)[0][-1], 0]
print (a)
g