我正在尝试计算一系列数据的局部最大值和最小值:如果当前行值大于或低于后续行和前一行,则将其设置为当前值,否则设置为NaN。有没有更优雅的方式去做,除了这一个:
import pandas as pd
import numpy as np
rng = pd.date_range('1/1/2014', periods=10, freq='5min')
s = pd.Series([1, 2, 3, 2, 1, 2, 3, 5, 7, 4], index=rng)
df = pd.DataFrame(s, columns=['val'])
df.index.name = "dt"
df['minmax'] = np.NaN
for i in range(len(df.index)):
if i == 0:
continue
if i == len(df.index) - 1:
continue
if df['val'][i] >= df['val'][i - 1] and df['val'][i] >= df['val'][i + 1]:
df['minmax'][i] = df['val'][i]
continue
if df['val'][i] <= df['val'][i - 1] and df['val'][i] <= df['val'][i + 1]:
df['minmax'][i] = df['val'][i]
continue
print(df)
结果是:
val minmax
dt
2014-01-01 00:00:00 1 NaN
2014-01-01 00:05:00 2 NaN
2014-01-01 00:10:00 3 3
2014-01-01 00:15:00 2 NaN
2014-01-01 00:20:00 1 1
2014-01-01 00:25:00 2 NaN
2014-01-01 00:30:00 3 NaN
2014-01-01 00:35:00 5 NaN
2014-01-01 00:40:00 7 7
2014-01-01 00:45:00 4 NaN
答案 0 :(得分:0)
我们可以使用shift
和where
来确定分配值的内容,重要的是我们在比较系列时必须使用位比较器&
和|
。 Shift
将返回移位1行(默认值)或传递值的Series或DataFrame。
使用where
时,我们可以传递布尔条件,第二个参数NaN
告诉它在False
时分配此值。
In [81]:
df['minmax'] = df['val'].where(((df['val'] < df['val'].shift(1))&(df['val'] < df['val'].shift(-1)) | (df['val'] > df['val'].shift(1))&(df['val'] > df['val'].shift(-1))), NaN)
df
Out[81]:
val minmax
dt
2014-01-01 00:00:00 1 NaN
2014-01-01 00:05:00 2 NaN
2014-01-01 00:10:00 3 3
2014-01-01 00:15:00 2 NaN
2014-01-01 00:20:00 1 1
2014-01-01 00:25:00 2 NaN
2014-01-01 00:30:00 3 NaN
2014-01-01 00:35:00 5 NaN
2014-01-01 00:40:00 7 7
2014-01-01 00:45:00 4 NaN