我已经堆叠了,我需要stackoverflow的智慧。
我想基于过去使用熊猫的星期几为销售创建一个简单的平均模型。
我的意思是,该模型根据过去的4个星期一预测下周一 Sales 的价值。此外,如果过去没有足够的周数,只需平均现有几周。
我的时间序列样本df:
weekDay weekNum Sales
Date
2013-01-01 2 1 22
2013-01-02 3 1 33
2013-01-03 4 1 44
2013-01-04 5 1 55
2013-01-05 6 1 66
2013-01-06 7 2 76
2013-01-07 1 2 23
2013-01-08 2 2 55
2013-01-09 3 2 34
2013-01-10 4 2 43
2013-01-11 5 2 34
2013-01-12 6 2 53
2013-01-13 7 3 52
2013-01-14 1 3 41
2013-01-15 2 3 31
2013-01-16 3 3 31
2013-01-17 4 3 42
2013-01-18 5 3 23
2013-01-19 6 3 41
2013-01-20 7 4 31
2013-01-21 1 4 31
2013-01-22 2 4 31
2013-01-23 3 4 43
2013-01-24 4 4 53
2013-01-25 5 4 32
2013-01-26 6 4 12
2013-01-27 7 5 41
2013-01-28 1 5 41
2013-01-29 2 5 12
2013-01-30 3 5 76
2013-01-31 4 5 43
2013-02-01 5 5 32
2013-02-02 6 5 54
2013-02-03 7 6 43
2013-02-04 1 6 43
2013-02-05 2 6 12
2013-02-06 3 6 12
2013-02-07 4 6 43
2013-02-08 5 6 22
2013-02-09 6 6 12
我尝试了什么:
ts.resample('D').rolling(28).mean()["Sales"]
ts.resample('W-MON').rolling(1).mean()["Sales"]
但它没有帮助。
答案 0 :(得分:1)
好的,抱歉,我想我现在明白你在找什么:
res = df.join(df.groupby('weekDay').rolling('28d').mean(), rsuffix='_avg')
所以现在res
包含:
weekDay weekNum Sales weekDay_avg weekNum_avg Sales_avg weekDay date 1 2013-01-07 1 2 23 1.0 2.0 23.000000 2013-01-14 1 3 41 1.0 2.5 32.000000 2013-01-21 1 4 31 1.0 3.0 31.666667 2013-01-28 1 5 41 1.0 3.5 34.000000 2013-02-04 1 6 43 1.0 4.5 39.000000 2 2013-01-01 2 1 22 2.0 1.0 22.000000 2013-01-08 2 2 55 2.0 1.5 38.500000 2013-01-15 2 3 31 2.0 2.0 36.000000 2013-01-22 2 4 31 2.0 2.5 34.750000 2013-01-29 2 5 12 2.0 3.5 32.250000 2013-02-05 2 6 12 2.0 4.5 21.500000 3 2013-01-02 3 1 33 3.0 1.0 33.000000 2013-01-09 3 2 34 3.0 1.5 33.500000 2013-01-16 3 3 31 3.0 2.0 32.666667 2013-01-23 3 4 43 3.0 2.5 35.250000 2013-01-30 3 5 76 3.0 3.5 46.000000 2013-02-06 3 6 12 3.0 4.5 40.500000 4 2013-01-03 4 1 44 4.0 1.0 44.000000 2013-01-10 4 2 43 4.0 1.5 43.500000 2013-01-17 4 3 42 4.0 2.0 43.000000 2013-01-24 4 4 53 4.0 2.5 45.500000 2013-01-31 4 5 43 4.0 3.5 45.250000 2013-02-07 4 6 43 4.0 4.5 45.250000 5 2013-01-04 5 1 55 5.0 1.0 55.000000 2013-01-11 5 2 34 5.0 1.5 44.500000 2013-01-18 5 3 23 5.0 2.0 37.333333 2013-01-25 5 4 32 5.0 2.5 36.000000 2013-02-01 5 5 32 5.0 3.5 30.250000 2013-02-08 5 6 22 5.0 4.5 27.250000 6 2013-01-05 6 1 66 6.0 1.0 66.000000 2013-01-12 6 2 53 6.0 1.5 59.500000 2013-01-19 6 3 41 6.0 2.0 53.333333 2013-01-26 6 4 12 6.0 2.5 43.000000 2013-02-02 6 5 54 6.0 3.5 40.000000 2013-02-09 6 6 12 6.0 4.5 29.750000 7 2013-01-06 7 2 76 7.0 2.0 76.000000 2013-01-13 7 3 52 7.0 2.5 64.000000 2013-01-20 7 4 31 7.0 3.0 53.000000 2013-01-27 7 5 41 7.0 3.5 50.000000 2013-02-03 7 6 43 7.0 4.5 41.750000
并且为了使它类似于我们开始使用的好的'DataFrame:
res.drop(['weekDay', 'weekDay_avg', 'weekNum_avg'], axis=1, inplace=True)
res.reset_index(inplace=True)
res.set_index('date', inplace=True)
res.sort_index(inplace=True)
所以现在它看起来像这样:
weekDay weekNum Sales Sales_avg date 2013-01-01 2 1 22 22.000000 2013-01-02 3 1 33 33.000000 2013-01-03 4 1 44 44.000000 2013-01-04 5 1 55 55.000000 2013-01-05 6 1 66 66.000000 2013-01-06 7 2 76 76.000000 2013-01-07 1 2 23 23.000000 2013-01-08 2 2 55 38.500000 2013-01-09 3 2 34 33.500000 2013-01-10 4 2 43 43.500000 2013-01-11 5 2 34 44.500000 2013-01-12 6 2 53 59.500000 2013-01-13 7 3 52 64.000000 2013-01-14 1 3 41 32.000000 2013-01-15 2 3 31 36.000000 2013-01-16 3 3 31 32.666667 2013-01-17 4 3 42 43.000000 2013-01-18 5 3 23 37.333333 2013-01-19 6 3 41 53.333333 2013-01-20 7 4 31 53.000000 2013-01-21 1 4 31 31.666667 2013-01-22 2 4 31 34.750000 2013-01-23 3 4 43 35.250000 2013-01-24 4 4 53 45.500000 2013-01-25 5 4 32 36.000000 2013-01-26 6 4 12 43.000000 2013-01-27 7 5 41 50.000000 2013-01-28 1 5 41 34.000000 2013-01-29 2 5 12 32.250000 2013-01-30 3 5 76 46.000000 2013-01-31 4 5 43 45.250000 2013-02-01 5 5 32 30.250000 2013-02-02 6 5 54 40.000000 2013-02-03 7 6 43 41.750000 2013-02-04 1 6 43 39.000000 2013-02-05 2 6 12 21.500000 2013-02-06 3 6 12 40.500000 2013-02-07 4 6 43 45.250000 2013-02-08 5 6 22 27.250000 2013-02-09 6 6 12 29.750000
希望这次真的有帮助(: