快速熊猫时间序列基于事件的时间范围计算

时间:2018-01-09 03:27:55

标签: python pandas datetime time-series

我有一个事件的时间序列(freq ='D'),如果没有事件则取值0,如果有事件则取值1。通常,这往往会在连续的几天内发生。

我想在事件时间范围内计算两个变量:

  1. 表示自事件开始以来的周数的值(周的周六为周末)。
  2. 自活动开始以来的一周内的日期编号
  3. 这是我想要做的一个例子

    # Dummy up a test frame
    date = pd.date_range(start='20150101', end='20150121', freq='D')
    event = np.zeros(len(date))
    event[2:5] = 1.
    event[15:20] = 1.
    df_test = pd.DataFrame({'date': date, 'event': event})
    

    数据看起来像这样。如您所见,事件在时间范围内出现两次。我计算了'snapped_date',以便它引用一周的星期六。

    In[2]: df_test
    Out[2]: 
             date  event
    0  2015-01-01    0.0
    1  2015-01-02    0.0
    2  2015-01-03    1.0
    3  2015-01-04    1.0
    4  2015-01-05    1.0
    5  2015-01-06    0.0
    6  2015-01-07    0.0
    7  2015-01-08    0.0
    8  2015-01-09    0.0
    9  2015-01-10    0.0
    10 2015-01-11    0.0
    11 2015-01-12    0.0
    12 2015-01-13    0.0
    13 2015-01-14    0.0
    14 2015-01-15    0.0
    15 2015-01-16    1.0
    16 2015-01-17    1.0
    17 2015-01-18    1.0
    18 2015-01-19    1.0
    19 2015-01-20    1.0
    20 2015-01-21    0.0
    

    我开始计算每个日期的周界限,如下所示:

    df_test.loc[:, 'snapped_date'] = df_test.date.map(pd.tseries.frequencies.to_offset('W-SAT').rollforward)
    

    现在,我想计算下面的两个新列:

             date snapped_date  event  week_of_event  day_within_week_of_event
    0  2015-01-01   2015-01-03    0.0            0.0                       0.0
    1  2015-01-02   2015-01-03    0.0            0.0                       0.0
    2  2015-01-03   2015-01-03    1.0            1.0                       1.0
    3  2015-01-04   2015-01-10    1.0            2.0                       1.0
    4  2015-01-05   2015-01-10    1.0            2.0                       2.0
    5  2015-01-06   2015-01-10    0.0            0.0                       0.0
    6  2015-01-07   2015-01-10    0.0            0.0                       0.0
    7  2015-01-08   2015-01-10    0.0            0.0                       0.0
    8  2015-01-09   2015-01-10    0.0            0.0                       0.0
    9  2015-01-10   2015-01-10    0.0            0.0                       0.0
    10 2015-01-11   2015-01-17    0.0            0.0                       0.0
    11 2015-01-12   2015-01-17    0.0            0.0                       0.0
    12 2015-01-13   2015-01-17    0.0            0.0                       0.0
    13 2015-01-14   2015-01-17    0.0            0.0                       0.0
    14 2015-01-15   2015-01-17    0.0            0.0                       0.0
    15 2015-01-16   2015-01-17    1.0            1.0                       1.0
    16 2015-01-17   2015-01-17    1.0            1.0                       2.0
    17 2015-01-18   2015-01-24    1.0            2.0                       1.0
    18 2015-01-19   2015-01-24    1.0            2.0                       2.0
    19 2015-01-20   2015-01-24    1.0            2.0                       3.0
    20 2015-01-21   2015-01-24    0.0            0.0                       0.0
    

    pandas中是否有任何时间序列功能可以帮助我以快速和Pythonic方式执行此操作?我有这样的多个tseries,并希望最终能够进行分组转换。

1 个答案:

答案 0 :(得分:1)

有了这个丑陋的解决方案,可以实现这个目标....

df['new']=((df.date.dt.dayofweek+1)//7).cumsum()      
df['new2']=df.event.diff().ne(0).cumsum()    
df['week_of_event']=df.loc[df.event!=0].groupby('new2').new.apply(lambda x : x.rolling(len(x), min_periods=1).apply(lambda y: len(np.unique(y))))
df['day_within_week_of_event']=df.loc[df.event!=0].groupby(['new2','week_of_event']).cumcount()+1


df.fillna(0)
Out[140]: 
         date  event  new  new2  week_of_event  day_within_week_of_event
0  2015-01-01    0.0    0     1            0.0                       0.0
1  2015-01-02    0.0    0     1            0.0                       0.0
2  2015-01-03    1.0    0     2            1.0                       1.0
3  2015-01-04    1.0    1     2            2.0                       1.0
4  2015-01-05    1.0    1     2            2.0                       2.0
5  2015-01-06    0.0    1     3            0.0                       0.0
6  2015-01-07    0.0    1     3            0.0                       0.0
7  2015-01-08    0.0    1     3            0.0                       0.0
8  2015-01-09    0.0    1     3            0.0                       0.0
9  2015-01-10    0.0    1     3            0.0                       0.0
10 2015-01-11    0.0    2     3            0.0                       0.0
11 2015-01-12    0.0    2     3            0.0                       0.0
12 2015-01-13    0.0    2     3            0.0                       0.0
13 2015-01-14    0.0    2     3            0.0                       0.0
14 2015-01-15    0.0    2     3            0.0                       0.0
15 2015-01-16    1.0    2     4            1.0                       1.0
16 2015-01-17    1.0    2     4            1.0                       2.0
17 2015-01-18    1.0    3     4            2.0                       1.0
18 2015-01-19    1.0    3     4            2.0                       2.0
19 2015-01-20    1.0    3     4            2.0                       3.0
20 2015-01-21    0.0    3     5            0.0                       0.0