如何从当前值中减去过去日历周的平均值?

时间:2017-02-20 18:14:12

标签: python python-2.7 pandas dataframe subtraction

我有一个数据帧df_pct_Max,其形状如下:

    Date    Value1  Value2
01.01.2015   5        6
08.01.2015   3        2
...          ...      ...
28.01.2017   7        8

我想计算每个日历周的平均值,并从日历周的实际值中减去它。

我创建了一个数据框,其中包含每个日历周的平均值,如下所示:

df_weekly_avg_Max = df_pct_Max.groupby(df_pct_Max.index.week).mean()

这会产生数据帧df_weekly_avg_Max:

    KW  Value1  Value2
      1   3.5   4.3
      2    4    3
      …    …    …
     52    8.33  6.2

现在我正在尝试从df_pct_Max中删除df_weekly_avg_Max,并希望在日历周之前完成此操作。

我尝试添加“KW”列然后

dfresult = df_pct_Max.sub(df_weekly_avg_Max, axis='KW')

但我在那里得到了错误。

是否还有一种方法可以滚动(从2015年第1周和2016年开始,过去3年中平均每周第1周的日历...)? 有人可以帮忙解决这个问题吗?

2 个答案:

答案 0 :(得分:1)

这个答案并不干净,因为它没有很好地利用熊猫,但我也认为它不会很慢(取决于你的数据帧有多大),基本的想法是建立一个列表这些方法每天重复一次,所以你可以简单地减去。

CODE:

from collections import Counter
import pandas as pd
import numpy as np

#Build up example data frame
num_days = 15
dates = pd.date_range('1/1/2015', periods=num_days, freq='D')
val1s = np.random.random_integers(1, 30, num_days)
val2s = np.random.random_integers(1, 30, num_days)

df_pct_MAX = pd.DataFrame({'Date':dates, 'Value1':val1s, 'Value2':val2s})
df_pct_MAX['Day'] = df_pct_MAX['Date'].dt.weekday_name
df_pct_MAX['Week'] = df_pct_MAX['Date'].dt.week

#OPs logic to get means
df_weekly_avg_Max = df_pct_MAX.groupby(df_pct_MAX['Week']).mean()

#Build up a list of the means repeated once for each day in that week
mean_fields = ['Value1','Value2'] #<-- only hardcoded portion
means_dict = {k:list(df_weekly_avg_Max[k]) for k in mean_fields} #<-- convert means into lists keyed by field
week_counts = Counter(df_pct_MAX['Week']).values() #<-- count how many days are represented in each week

#Build up a dict keyed by field with the means repeated the correct number of times
means = {k:[means_dict[k][i] for i,count in enumerate(week_counts)
         for x in range(count)] for k in mean_fields}

#Assign a new column to the means for each field (not necessary, just to show done correctly)
for k in mean_fields:
    df_pct_MAX[k+'Mean'] = means[k]

print(df_pct_MAX)

输出:

         Date  Value1  Value2        Day  Week  Value1Mean  Value2Mean
0  2015-01-01      12      19   Thursday     1    9.000000   19.250000
1  2015-01-02      15      27     Friday     1    9.000000   19.250000
2  2015-01-03       2      30   Saturday     1    9.000000   19.250000
3  2015-01-04       7       1     Sunday     1    9.000000   19.250000
4  2015-01-05       6      20     Monday     2   17.571429   14.142857
5  2015-01-06       9      24    Tuesday     2   17.571429   14.142857
6  2015-01-07      25      17  Wednesday     2   17.571429   14.142857
7  2015-01-08      22       8   Thursday     2   17.571429   14.142857
8  2015-01-09      30       7     Friday     2   17.571429   14.142857
9  2015-01-10      10       1   Saturday     2   17.571429   14.142857
10 2015-01-11      21      22     Sunday     2   17.571429   14.142857
11 2015-01-12      23      29     Monday     3   23.750000   19.750000
12 2015-01-13      23      16    Tuesday     3   23.750000   19.750000
13 2015-01-14      21      17  Wednesday     3   23.750000   19.750000
14 2015-01-15      28      17   Thursday     3   23.750000   19.750000

答案 1 :(得分:1)

我找到了整个数据帧的解决方案。 我添加了一个专栏&#39; KW&#39;对于日历周,然后使用lambda函数对其执行groupby,该函数减去日历周的平均值&#34; 1&#34;从日历周的当前值&#34; 1&#34; ...

df_pct_Max ['KW']     = df_pct_Max.index.week
dfresult = df_pct_Max.groupby(by='KW').transform(lambda x: x-x.mean())

这适合我。

能够调整平均值的时间范围会更好,例如我减去当前的日历周&#34; 1&#34;估算过去3年左右的日历周的平均值。但这似乎相当复杂,这个解决方案适用于当前的分析。