我希望使用以下df来制作df = df.groupby(['id','quarter'])['jobs].mean()
,但同时数据框必须具有另一列中id和year的作业均值。
id year quarter month jobs
1 2007 1 1 10
1 2007 1 2 12
1 2007 1 3 12
1 2007 2 4 12
1 2007 2 5 12
1 2007 2 6 13
1 2007 3 7 14
1 2007 3 8 9
1 2007 3 9 12
1 2007 4 10 15
1 2007 4 12 18
2 2007 1 1 15
2 2007 1 2 15
2 2007 1 3 16
2 2007 2 4 17
2 2007 2 5 18
2 2007 2 6 10
2 2007 3 7 12
2 2007 3 8 12
2 2007 3 9 12
2 2007 4 10 12
2 2007 4 11 13
2 2007 4 12 14
结果应该如下所示
id year quarter jobs jobs_year
1 2007 1 (mean quarter) (mean year)
1 2007 2 (mean quarter) (mean year)
1 2007 3 (mean quarter) (mean year)
1 2007 4 (mean quarter) (mean year)
2 2007 1 (mean quarter) (mean year)
2 2007 2 (mean quarter) (mean year)
2 2007 3 (mean quarter) (mean year)
2 2007 4 (mean quarter) (mean year)
答案 0 :(得分:3)
使用transform
然后drop_duplicates
df['jobs1']=df.groupby(['id','quarter'])['jobs'].transform('mean')
df['jobs_year']=df.groupby(['id','year'])['jobs'].transform('mean')
df=df.drop_duplicates(['id','year','quarter'])
df
Out[305]:
id year quarter month jobs jobs1 jobs_year
0 1 2007 1 1 10 11.333333 12.636364
3 1 2007 2 4 12 12.333333 12.636364
6 1 2007 3 7 14 11.666667 12.636364
9 1 2007 4 10 15 16.500000 12.636364
11 2 2007 1 1 15 15.333333 13.833333
14 2 2007 2 4 17 15.000000 13.833333
17 2 2007 3 7 12 12.000000 13.833333
20 2 2007 4 10 12 13.000000 13.833333