我有一个pandas
数据框:
emp laborforce emp_rate
occ statefip quarter
10 1 0 6.561213e+06 7.017537e+06 0.934974
4 8.580723e+06 9.114996e+06 0.941385
8 8.588012e+06 9.102831e+06 0.943444
12 2.093297e+06 2.220923e+06 0.942535
2 0 6.561208e+06 7.017527e+06 0.934974
现在,我想将每个emp_rate
的平均值(occ, statefip)
合并到此数据集中。我试过了
df2 = df1.groupby(level=[0, 1])['emp_rate'].mean()
df2.name = 'emp_rate_mean'
df1.join(df2, how='inner')
NotImplementedError: merging with more than one level overlap on a multi-index is not implemented
显然,如果第二个数据帧是单索引的,那么这种类型的join
将在pandas
0.14后起作用。它不是。在这种情况下,正确的方法是什么?
答案 0 :(得分:1)
In [102]: df['emp_rate_avg'] = df.groupby(level=[0, 1])['emp_rate'].transform('mean')
In [103]: df
Out[103]:
emp laborforce emp_rate emp_rate_avg
occ statefip quarter
10.0 1.0 0 6561213.0 7017537.0 0.934974 0.940584
4 8580723.0 9114996.0 0.941385 0.940584
8 8588012.0 9102831.0 0.943444 0.940584
12 2093297.0 2220923.0 0.942535 0.940584
2.0 0 6561208.0 7017527.0 0.934974 0.934974