我有以下数据框:
date value
2014-01-20 10
2014-01-21 12
2014-01-22 13
2014-01-23 9
2014-01-24 7
2014-01-25 12
2014-01-26 11
我需要能够跟踪特定滚动窗口中发生的最新最大值和最小值的时间。例如,如果我使用滚动窗口周期为5,那么我需要一个如下输出:
date value rolling_max_date rolling_min_date
2014-01-20 10 2014-01-20 2014-01-20
2014-01-21 12 2014-01-21 2014-01-20
2014-01-22 13 2014-01-22 2014-01-20
2014-01-23 9 2014-01-22 2014-01-23
2014-01-24 7 2014-01-22 2014-01-24
2014-01-25 12 2014-01-22 2014-01-24
2014-01-26 11 2014-01-25 2014-01-24
所有这些显示的是,滚动窗口中最新的最大值和最小值的日期是多少。我知道pandas有rolling_min和rolling_max,但我不知道如何跟踪窗口内最近的最大/最小时间的索引/日期。
答案 0 :(得分:4)
有一个更通用的rolling_apply
,您可以在其中提供自己的功能。但是,自定义函数将窗口作为数组接收,而不是数据帧,因此索引信息不可用(因此您无法使用idxmin/max
)。
但是让我们分两步尝试实现这个目标:
In [41]: df = df.set_index('date')
In [42]: pd.rolling_apply(df, window=5, func=lambda x: x.argmin(), min_periods=1)
Out[42]:
value
date
2014-01-20 0
2014-01-21 0
2014-01-22 0
2014-01-23 3
2014-01-24 4
2014-01-25 3
2014-01-26 2
这为您提供了找到最小值的窗口中的索引。但是,此索引适用于该特定窗口,而不适用于整个数据帧。因此,让我们添加窗口的开头,然后使用此整数位置来检索正确的索引位置索引:
In [45]: ilocs_window = pd.rolling_apply(df, window=5, func=lambda x: x.argmin(), min_periods=1)
In [46]: ilocs = ilocs_window['value'] + ([0, 0, 0, 0] + range(len(ilocs_window)-4))
In [47]: ilocs
Out[47]:
date
2014-01-20 0
2014-01-21 0
2014-01-22 0
2014-01-23 3
2014-01-24 4
2014-01-25 4
2014-01-26 4
Name: value, dtype: float64
In [48]: df.index.take(ilocs)
Out[48]:
Index([u'2014-01-20', u'2014-01-20', u'2014-01-20', u'2014-01-23',
u'2014-01-24', u'2014-01-24', u'2014-01-24'],
dtype='object', name=u'date')
In [49]: df['rolling_min_date'] = df.index.take(ilocs)
In [50]: df
Out[50]:
value rolling_min_date
date
2014-01-20 10 2014-01-20
2014-01-21 12 2014-01-20
2014-01-22 13 2014-01-20
2014-01-23 9 2014-01-23
2014-01-24 7 2014-01-24
2014-01-25 12 2014-01-24
2014-01-26 11 2014-01-24
最大可以做同样的事情:
ilocs_window = pd.rolling_apply(df, window=5, func=lambda x: x.argmax(), min_periods=1)
ilocs = ilocs_window['value'] + ([0, 0, 0, 0] + range(len(ilocs_window)-4))
df['rolling_max_date'] = df.index.take(ilocs)
答案 1 :(得分:1)
这是一种解决方法。
import pandas as pd
import numpy as np
# sample data
# ===============================================
np.random.seed(0)
df = pd.DataFrame(np.random.randint(1,30,20), index=pd.date_range('2015-01-01', periods=20, freq='D'), columns=['value'])
df
value
2015-01-01 13
2015-01-02 16
2015-01-03 22
2015-01-04 1
2015-01-05 4
2015-01-06 28
2015-01-07 4
2015-01-08 8
2015-01-09 10
2015-01-10 20
2015-01-11 22
2015-01-12 19
2015-01-13 5
2015-01-14 24
2015-01-15 7
2015-01-16 25
2015-01-17 25
2015-01-18 13
2015-01-19 27
2015-01-20 2
# processing
# ==========================================
# your cumstom function to track on max/min value/date
def track_minmax(df):
return pd.Series({'current_date': df.index[-1], 'rolling_max_val': df['value'].max(), 'rolling_max_date': df['value'].idxmax(), 'rolling_min_val': df['value'].min(), 'rolling_min_date': df['value'].idxmin()})
window = 5
# use list comprehension to do the for loop
pd.DataFrame([track_minmax(df.iloc[i:i+window]) for i in range(len(df)-window+1)]).set_index('current_date').reindex(df.index)
rolling_max_date rolling_max_val rolling_min_date rolling_min_val
2015-01-01 NaT NaN NaT NaN
2015-01-02 NaT NaN NaT NaN
2015-01-03 NaT NaN NaT NaN
2015-01-04 NaT NaN NaT NaN
2015-01-05 2015-01-03 22 2015-01-04 1
2015-01-06 2015-01-06 28 2015-01-04 1
2015-01-07 2015-01-06 28 2015-01-04 1
2015-01-08 2015-01-06 28 2015-01-04 1
2015-01-09 2015-01-06 28 2015-01-05 4
2015-01-10 2015-01-06 28 2015-01-07 4
2015-01-11 2015-01-11 22 2015-01-07 4
2015-01-12 2015-01-11 22 2015-01-08 8
2015-01-13 2015-01-11 22 2015-01-13 5
2015-01-14 2015-01-14 24 2015-01-13 5
2015-01-15 2015-01-14 24 2015-01-13 5
2015-01-16 2015-01-16 25 2015-01-13 5
2015-01-17 2015-01-16 25 2015-01-13 5
2015-01-18 2015-01-16 25 2015-01-15 7
2015-01-19 2015-01-19 27 2015-01-15 7
2015-01-20 2015-01-19 27 2015-01-20 2