当按周进行“分组”时,结果很奇怪

时间:2018-09-11 06:06:43

标签: python pandas pandas-groupby

以下是数据生成器代码以及 grouby 函数,用于按我的问题按 weekofyear 分组:

import pandas as pd
import numpy as np

np.random.seed(100)

pd.options.display.max_rows = 1000
df = pd.DataFrame(np.random.randint(0,10,size=(600, 2)), columns=list('AB'))
df['date'] = pd.DataFrame(pd.date_range(start='1/1/2018', end='8/23/2019'))

df['weekofyear'] = df.date.dt.weekofyear
df['year'] = df.date.dt.year

df1 = df.groupby(['year', 'weekofyear']).agg({'A':'sum', 'B':'mean'})

这是df的前10行:

    A   B   date    weekofyear  year
0   8   8   2018-01-01  1   2018
1   3   7   2018-01-02  1   2018
2   7   0   2018-01-03  1   2018
3   4   2   2018-01-04  1   2018
4   5   2   2018-01-05  1   2018
5   2   2   2018-01-06  1   2018
6   1   0   2018-01-07  1   2018
7   8   4   2018-01-08  2   2018
8   0   9   2018-01-09  2   2018
9   6   2   2018-01-10  2   2018
10  4   1   2018-01-11  2   2018

这是df1的前10行:

year      weekofyear   A     B  
2018      1           31    3.375000
          2           30    4.285714
          3           26    4.142857
          4           37    3.142857
          5           19    6.142857
          6           34    4.142857
          7           30    4.142857
          8           43    4.571429
          9           35    5.142857
          10          24    4.000000

但是 df1 中的第一行(对应于 df 的前7天)显示了A列的累计值不正确(**显示31,不正确,应为30 **)。

  

8 + 3 + 7 + 4 + 5 + 2 + 1 = 30

     

8 + 3 + 7 + 4 + 5 + 2 + 1!= 31

1 个答案:

答案 0 :(得分:2)

有一个问题,它也要与2018-12-31行,因为ISO week date

print (df[df['weekofyear'] == 1])

     A  B       date  weekofyear  year
0    8  8 2018-01-01           1  2018
1    3  7 2018-01-02           1  2018
2    7  0 2018-01-03           1  2018
3    4  2 2018-01-04           1  2018
4    5  2 2018-01-05           1  2018
5    2  2 2018-01-06           1  2018
6    1  0 2018-01-07           1  2018
364  1  6 2018-12-31           1  2018
365  5  5 2019-01-01           1  2019
366  2  0 2019-01-02           1  2019
367  9  3 2019-01-03           1  2019
368  4  7 2019-01-04           1  2019
369  4  7 2019-01-05           1  2019
370  9  4 2019-01-06           1  2019