以下是数据生成器代码以及 grouby 函数,用于按我的问题按 weekofyear 分组:
import pandas as pd
import numpy as np
np.random.seed(100)
pd.options.display.max_rows = 1000
df = pd.DataFrame(np.random.randint(0,10,size=(600, 2)), columns=list('AB'))
df['date'] = pd.DataFrame(pd.date_range(start='1/1/2018', end='8/23/2019'))
df['weekofyear'] = df.date.dt.weekofyear
df['year'] = df.date.dt.year
df1 = df.groupby(['year', 'weekofyear']).agg({'A':'sum', 'B':'mean'})
这是df的前10行:
A B date weekofyear year
0 8 8 2018-01-01 1 2018
1 3 7 2018-01-02 1 2018
2 7 0 2018-01-03 1 2018
3 4 2 2018-01-04 1 2018
4 5 2 2018-01-05 1 2018
5 2 2 2018-01-06 1 2018
6 1 0 2018-01-07 1 2018
7 8 4 2018-01-08 2 2018
8 0 9 2018-01-09 2 2018
9 6 2 2018-01-10 2 2018
10 4 1 2018-01-11 2 2018
这是df1的前10行:
year weekofyear A B
2018 1 31 3.375000
2 30 4.285714
3 26 4.142857
4 37 3.142857
5 19 6.142857
6 34 4.142857
7 30 4.142857
8 43 4.571429
9 35 5.142857
10 24 4.000000
但是 df1 中的第一行(对应于 df 的前7天)显示了A列的累计值不正确(**显示31,不正确,应为30 **)。
8 + 3 + 7 + 4 + 5 + 2 + 1 = 30
8 + 3 + 7 + 4 + 5 + 2 + 1!= 31
答案 0 :(得分:2)
有一个问题,它也要与2018-12-31
行,因为ISO week date:
print (df[df['weekofyear'] == 1])
A B date weekofyear year
0 8 8 2018-01-01 1 2018
1 3 7 2018-01-02 1 2018
2 7 0 2018-01-03 1 2018
3 4 2 2018-01-04 1 2018
4 5 2 2018-01-05 1 2018
5 2 2 2018-01-06 1 2018
6 1 0 2018-01-07 1 2018
364 1 6 2018-12-31 1 2018
365 5 5 2019-01-01 1 2019
366 2 0 2019-01-02 1 2019
367 9 3 2019-01-03 1 2019
368 4 7 2019-01-04 1 2019
369 4 7 2019-01-05 1 2019
370 9 4 2019-01-06 1 2019