最近的最大/最小值

时间:2015-08-10 13:23:47

标签: python pandas

我有以下数据框:

date          value
2014-01-20    10
2014-01-21    12
2014-01-22    13
2014-01-23    9
2014-01-24    7
2014-01-25    12
2014-01-26    11

我需要能够跟踪特定滚动窗口中发生的最新最大值和最小值的时间。例如,如果我使用滚动窗口周期为5,那么我需要一个如下输出:

date          value   rolling_max_date    rolling_min_date
2014-01-20    10      2014-01-20          2014-01-20
2014-01-21    12      2014-01-21          2014-01-20
2014-01-22    13      2014-01-22          2014-01-20
2014-01-23    9       2014-01-22          2014-01-23
2014-01-24    7       2014-01-22          2014-01-24
2014-01-25    12      2014-01-22          2014-01-24
2014-01-26    11      2014-01-25          2014-01-24

所有这些显示的是,滚动窗口中最新的最大值和最小值的日期是多少。我知道pandas有rolling_min和rolling_max,但我不知道如何跟踪窗口内最近的最大/最小时间的索引/日期。

2 个答案:

答案 0 :(得分:4)

有一个更通用的rolling_apply,您可以在其中提供自己的功能。但是,自定义函数将窗口作为数组接收,而不是数据帧,因此索引信息不可用(因此您无法使用idxmin/max)。

但是让我们分两步尝试实现这个目标:

In [41]: df = df.set_index('date')
In [42]: pd.rolling_apply(df, window=5, func=lambda x: x.argmin(), min_periods=1)
Out[42]:
            value
date
2014-01-20      0
2014-01-21      0
2014-01-22      0
2014-01-23      3
2014-01-24      4
2014-01-25      3
2014-01-26      2

这为您提供了找到最小值的窗口中的索引。但是,此索引适用于该特定窗口,而不适用于整个数据帧。因此,让我们添加窗口的开头,然后使用此整数位置来检索正确的索引位置索引:

In [45]: ilocs_window = pd.rolling_apply(df, window=5, func=lambda x: x.argmin(), min_periods=1)

In [46]: ilocs = ilocs_window['value'] + ([0, 0, 0, 0] + range(len(ilocs_window)-4))

In [47]: ilocs
Out[47]:
date
2014-01-20    0
2014-01-21    0
2014-01-22    0
2014-01-23    3
2014-01-24    4
2014-01-25    4
2014-01-26    4
Name: value, dtype: float64

In [48]: df.index.take(ilocs)
Out[48]:
Index([u'2014-01-20', u'2014-01-20', u'2014-01-20', u'2014-01-23',
       u'2014-01-24', u'2014-01-24', u'2014-01-24'],
      dtype='object', name=u'date')

In [49]: df['rolling_min_date'] = df.index.take(ilocs)

In [50]: df
Out[50]:
            value rolling_min_date
date
2014-01-20     10       2014-01-20
2014-01-21     12       2014-01-20
2014-01-22     13       2014-01-20
2014-01-23      9       2014-01-23
2014-01-24      7       2014-01-24
2014-01-25     12       2014-01-24
2014-01-26     11       2014-01-24

最大可以做同样的事情:

ilocs_window = pd.rolling_apply(df, window=5, func=lambda x: x.argmax(), min_periods=1)
ilocs = ilocs_window['value'] + ([0, 0, 0, 0] + range(len(ilocs_window)-4))
df['rolling_max_date'] = df.index.take(ilocs)

答案 1 :(得分:1)

这是一种解决方法。

import pandas as pd
import numpy as np

# sample data
# ===============================================
np.random.seed(0)
df = pd.DataFrame(np.random.randint(1,30,20), index=pd.date_range('2015-01-01', periods=20, freq='D'), columns=['value'])
df

            value
2015-01-01     13
2015-01-02     16
2015-01-03     22
2015-01-04      1
2015-01-05      4
2015-01-06     28
2015-01-07      4
2015-01-08      8
2015-01-09     10
2015-01-10     20
2015-01-11     22
2015-01-12     19
2015-01-13      5
2015-01-14     24
2015-01-15      7
2015-01-16     25
2015-01-17     25
2015-01-18     13
2015-01-19     27
2015-01-20      2

# processing
# ==========================================
# your cumstom function to track on max/min value/date
def track_minmax(df):
    return pd.Series({'current_date': df.index[-1], 'rolling_max_val': df['value'].max(), 'rolling_max_date': df['value'].idxmax(), 'rolling_min_val': df['value'].min(), 'rolling_min_date': df['value'].idxmin()})

window = 5
# use list comprehension to do the for loop
pd.DataFrame([track_minmax(df.iloc[i:i+window]) for i in range(len(df)-window+1)]).set_index('current_date').reindex(df.index)

           rolling_max_date  rolling_max_val rolling_min_date  rolling_min_val
2015-01-01              NaT              NaN              NaT              NaN
2015-01-02              NaT              NaN              NaT              NaN
2015-01-03              NaT              NaN              NaT              NaN
2015-01-04              NaT              NaN              NaT              NaN
2015-01-05       2015-01-03               22       2015-01-04                1
2015-01-06       2015-01-06               28       2015-01-04                1
2015-01-07       2015-01-06               28       2015-01-04                1
2015-01-08       2015-01-06               28       2015-01-04                1
2015-01-09       2015-01-06               28       2015-01-05                4
2015-01-10       2015-01-06               28       2015-01-07                4
2015-01-11       2015-01-11               22       2015-01-07                4
2015-01-12       2015-01-11               22       2015-01-08                8
2015-01-13       2015-01-11               22       2015-01-13                5
2015-01-14       2015-01-14               24       2015-01-13                5
2015-01-15       2015-01-14               24       2015-01-13                5
2015-01-16       2015-01-16               25       2015-01-13                5
2015-01-17       2015-01-16               25       2015-01-13                5
2015-01-18       2015-01-16               25       2015-01-15                7
2015-01-19       2015-01-19               27       2015-01-15                7
2015-01-20       2015-01-19               27       2015-01-20                2