我有一个事件的时间序列(freq ='D'),如果没有事件则取值0,如果有事件则取值1。通常,这往往会在连续的几天内发生。
我想在事件时间范围内计算两个变量:
这是我想要做的一个例子
# Dummy up a test frame
date = pd.date_range(start='20150101', end='20150121', freq='D')
event = np.zeros(len(date))
event[2:5] = 1.
event[15:20] = 1.
df_test = pd.DataFrame({'date': date, 'event': event})
数据看起来像这样。如您所见,事件在时间范围内出现两次。我计算了'snapped_date',以便它引用一周的星期六。
In[2]: df_test
Out[2]:
date event
0 2015-01-01 0.0
1 2015-01-02 0.0
2 2015-01-03 1.0
3 2015-01-04 1.0
4 2015-01-05 1.0
5 2015-01-06 0.0
6 2015-01-07 0.0
7 2015-01-08 0.0
8 2015-01-09 0.0
9 2015-01-10 0.0
10 2015-01-11 0.0
11 2015-01-12 0.0
12 2015-01-13 0.0
13 2015-01-14 0.0
14 2015-01-15 0.0
15 2015-01-16 1.0
16 2015-01-17 1.0
17 2015-01-18 1.0
18 2015-01-19 1.0
19 2015-01-20 1.0
20 2015-01-21 0.0
我开始计算每个日期的周界限,如下所示:
df_test.loc[:, 'snapped_date'] = df_test.date.map(pd.tseries.frequencies.to_offset('W-SAT').rollforward)
现在,我想计算下面的两个新列:
date snapped_date event week_of_event day_within_week_of_event
0 2015-01-01 2015-01-03 0.0 0.0 0.0
1 2015-01-02 2015-01-03 0.0 0.0 0.0
2 2015-01-03 2015-01-03 1.0 1.0 1.0
3 2015-01-04 2015-01-10 1.0 2.0 1.0
4 2015-01-05 2015-01-10 1.0 2.0 2.0
5 2015-01-06 2015-01-10 0.0 0.0 0.0
6 2015-01-07 2015-01-10 0.0 0.0 0.0
7 2015-01-08 2015-01-10 0.0 0.0 0.0
8 2015-01-09 2015-01-10 0.0 0.0 0.0
9 2015-01-10 2015-01-10 0.0 0.0 0.0
10 2015-01-11 2015-01-17 0.0 0.0 0.0
11 2015-01-12 2015-01-17 0.0 0.0 0.0
12 2015-01-13 2015-01-17 0.0 0.0 0.0
13 2015-01-14 2015-01-17 0.0 0.0 0.0
14 2015-01-15 2015-01-17 0.0 0.0 0.0
15 2015-01-16 2015-01-17 1.0 1.0 1.0
16 2015-01-17 2015-01-17 1.0 1.0 2.0
17 2015-01-18 2015-01-24 1.0 2.0 1.0
18 2015-01-19 2015-01-24 1.0 2.0 2.0
19 2015-01-20 2015-01-24 1.0 2.0 3.0
20 2015-01-21 2015-01-24 0.0 0.0 0.0
pandas中是否有任何时间序列功能可以帮助我以快速和Pythonic方式执行此操作?我有这样的多个tseries,并希望最终能够进行分组转换。
答案 0 :(得分:1)
有了这个丑陋的解决方案,可以实现这个目标....
df['new']=((df.date.dt.dayofweek+1)//7).cumsum()
df['new2']=df.event.diff().ne(0).cumsum()
df['week_of_event']=df.loc[df.event!=0].groupby('new2').new.apply(lambda x : x.rolling(len(x), min_periods=1).apply(lambda y: len(np.unique(y))))
df['day_within_week_of_event']=df.loc[df.event!=0].groupby(['new2','week_of_event']).cumcount()+1
df.fillna(0)
Out[140]:
date event new new2 week_of_event day_within_week_of_event
0 2015-01-01 0.0 0 1 0.0 0.0
1 2015-01-02 0.0 0 1 0.0 0.0
2 2015-01-03 1.0 0 2 1.0 1.0
3 2015-01-04 1.0 1 2 2.0 1.0
4 2015-01-05 1.0 1 2 2.0 2.0
5 2015-01-06 0.0 1 3 0.0 0.0
6 2015-01-07 0.0 1 3 0.0 0.0
7 2015-01-08 0.0 1 3 0.0 0.0
8 2015-01-09 0.0 1 3 0.0 0.0
9 2015-01-10 0.0 1 3 0.0 0.0
10 2015-01-11 0.0 2 3 0.0 0.0
11 2015-01-12 0.0 2 3 0.0 0.0
12 2015-01-13 0.0 2 3 0.0 0.0
13 2015-01-14 0.0 2 3 0.0 0.0
14 2015-01-15 0.0 2 3 0.0 0.0
15 2015-01-16 1.0 2 4 1.0 1.0
16 2015-01-17 1.0 2 4 1.0 2.0
17 2015-01-18 1.0 3 4 2.0 1.0
18 2015-01-19 1.0 3 4 2.0 2.0
19 2015-01-20 1.0 3 4 2.0 3.0
20 2015-01-21 0.0 3 5 0.0 0.0