我有一个包含每周数据的数据集,但是如果该周超过月份,我需要根据行的权重计算其平均值。例如:
Current_Week Sales
0 29/Dec/2013-04/Jan/2014 3685.236419
1 05/Jan/2014-11/Jan/2014 3784.023564
2 12/Jan/2014-18/Jan/2014 3726.933727
3 19/Jan/2014-25/Jan/2014 3690.440944
4 26/Jan/2014-01/Feb/2014 3731.523630
5 02/Feb/2014-08/Feb/2014 3753.882783
6 09/Feb/2014-15/Feb/2014 3643.997381
7 16/Feb/2014-22/Feb/2014 3696.243919
8 23/Feb/2014-01/Mar/2014 3718.254426
最终所需的输出是:
Month Sales
1-Jan-2014 3727.09
1-Feb-2014 3703.57
要注意的是,对于第0行的输入数据帧,我需要计算weightage
的{{1}},以便以后可以用于计算销售平均值。例如,一月份的月份
如您所见,一月的月销售额是通过将所有平均销售额相加然后除以加权天数得出的:the number of days in that week for that month
我知道,如果数据跨越月份,我必须先将时间序列分成两行,然后分别16505.69 / 4.42 = 3727.09
和sum
。我想念什么吗?
答案 0 :(得分:2)
假设周是连续的,那么我们只需要担心周的开始(因为结束是开始+ 1天):
# get start and end dates of the weeks
time_df = df.Current_Week.str.split('-', expand=True)
time_df.columns = ['start','end']
# convert to datetime
time_df = time_df.apply(pd.to_datetime)
# combine with original data
new_df = pd.concat((df, time_df), sort=False, axis=1)
# all the dates in range
all_dates = pd.date_range(new_df.start.iloc[0], new_df.end.iloc[-1], freq='D')
# set start as index for interpolate
new_df = (new_df[['Sales','start']]
.set_index('start')
.reindex(all_dates) # resample to all dates
.ffill() # fill missing days
.resample('MS') # group over the month
.mean() # taking mean
)
输出:
Sales
2013-12-01 3685.236419
2014-01-01 3727.092745
2014-02-01 3703.568527
2014-03-01 3718.254426
答案 1 :(得分:0)
销售月份和总金额
data.groupby('Month')['sales'].sum()