熊猫将每周时间序列和按月分组

时间:2019-07-08 04:28:14

标签: python python-3.x pandas dataframe

我有一个包含每周数据的数据集,但是如果该周超过月份,我需要根据行的权重计算其平均值。例如:

  Current_Week             Sales
0 29/Dec/2013-04/Jan/2014  3685.236419
1 05/Jan/2014-11/Jan/2014  3784.023564
2 12/Jan/2014-18/Jan/2014  3726.933727
3 19/Jan/2014-25/Jan/2014  3690.440944
4 26/Jan/2014-01/Feb/2014  3731.523630
5 02/Feb/2014-08/Feb/2014  3753.882783
6 09/Feb/2014-15/Feb/2014  3643.997381
7 16/Feb/2014-22/Feb/2014  3696.243919
8 23/Feb/2014-01/Mar/2014  3718.254426

最终所需的输出是:

Month       Sales
1-Jan-2014  3727.09
1-Feb-2014  3703.57

要注意的是,对于第0行的输入数据帧,我需要计算weightage的{​​{1}},以便以后可以用于计算销售平均值。例如,一月份的月份

enter image description here

如您所见,一月的月销售额是通过将所有平均销售额相加然后除以加权天数得出的:the number of days in that week for that month

我知道,如果数据跨越月份,我必须先将时间序列分成两行,然后分别16505.69 / 4.42 = 3727.09sum。我想念什么吗?

2 个答案:

答案 0 :(得分:2)

假设周是连续的,那么我们只需要担心周的开始(因为结束是开始+ 1天):

# get start and end dates of the weeks
time_df = df.Current_Week.str.split('-', expand=True)
time_df.columns = ['start','end']

# convert to datetime 
time_df = time_df.apply(pd.to_datetime)

# combine with original data
new_df = pd.concat((df, time_df), sort=False, axis=1)

# all the dates in range
all_dates = pd.date_range(new_df.start.iloc[0], new_df.end.iloc[-1], freq='D')

# set start as index for interpolate
new_df = (new_df[['Sales','start']]
            .set_index('start')
            .reindex(all_dates)  # resample to all dates
            .ffill()             # fill missing days
            .resample('MS')      # group over the month
            .mean()              # taking mean
     )

输出:

                  Sales
2013-12-01  3685.236419
2014-01-01  3727.092745
2014-02-01  3703.568527
2014-03-01  3718.254426

答案 1 :(得分:0)

销售月份和总金额

data.groupby('Month')['sales'].sum()