我有一个带有日期列的DataFrame,我想在某些范围内重新采样此DataFrame(半周期)。
例如 这是DataFrame:
Date
2012-01-19 130.0
2011-03-03 120.0
2011-05-05 105.0
2011-06-06 175.0
2011-06-06 200.0
2011-07-07 102.0
2011-08-08 300.0
2011-09-09 200.0
2011-11-11 10.0
2012-03-20 10.0
2011-12-12 130.0
2012-01-01 30.0
2012-02-02 150.0
2012-02-15 200.0
2012-03-15 120.0
2012-02-25 10.0
2012-01-29 90.0
我想以半周期的方式重新取样(日期范围包括在内):
2011-01-01
2011-01-02 - 2011-01-07
2011-01-08 - 2011-01-15
2011-01-16 - 2011-01-22
2011-01-23 - 2011-01-28
2011-02-01
2011-02-02 - 2011-02-07
2011-02-08 - 2011-02-15
2011-02-16 - 2011-02-22
2011-02-23 - 2011-02-28
等
我目前的解决方案如下:使用“粘性日期”和按groupby求和(“StickyDate”)。sum():
def get_week_index_by_date(date):
if date.day == 1:
return 0
if date.day < 8:
return 1
if date.day < 15:
return 2
if date.day < 22:
return 3
else:
return 4
def get_sticky_date(date):
return pandas.datetime(date.year, date.month, min(get_week_index_by_date(date)*7+1, 28))
df["StickyDate"] = df.Date.apply(get_sticky_date)
weekly_progress = df.groupby("StickyDate").sum()
现在我需要重新索引,以便错过的日期在我的系列中为0,并按日期排序:
def get_weekly_progress_index(start_date, end_date):
curr_date = start_date
while curr_date < end_date:
for i in range(5):
yield pandas.datetime(curr_date.year, curr_date.month, min(i*7+1, 28))
curr_date += pandas.offsets.MonthBegin(1)
new_index = list(get_weekly_progress_index(weekly_progress.index.min(), weekly_progress.index.max()))
weekly_progress = weekly_progress.reindex(new_index).fillna(0)
我的问题 - 是否有更简单的方法来实现这一目标,或者这是最好的方法呢?