year x y
1987 1.609438 0
1988 1.386294 0
1989 1.098612 1
1987 0.693147 0
1988 0.000000 0
1989 -0.693147 1
...
所以,我可以逐年得到x的平均值
>>> df.groupby(['year'])['x','y'].mean()
x y
year meanX meanY
1987 0.597434 0.000000
1988 0.428441 0.351852
1989 0.155169 0.185185
如何添加将每行与年份相关联的新列?我的意思是我想要这样的东西:
year x y meanX meanY
1987 1.609438 0 0.597434 0.000000
1988 1.386294 0 0.428441 0.351852
1989 1.098612 1 0.155169 0.185185
1987 0.693147 0 0.597434 0.000000
1988 0.000000 0 0.428441 0.351852
1989 -0.693147 1 0.155169 0.185185
这样做的正确方法是什么?
答案 0 :(得分:1)
df['x_mean'] = df.groupby('year').x.transform(lambda s: s.mean())
df['y_mean'] = df.groupby('year').y.transform(lambda s: s.mean())
>>> df
year x y x_mean y_mean
0 1987 1.609438 0 1.151293 0
1 1988 1.386294 0 0.693147 0
2 1989 1.098612 1 0.202733 1
3 1987 0.693147 0 1.151293 0
4 1988 0.000000 0 0.693147 0
5 1989 -0.693147 1 0.202733 1
答案 1 :(得分:0)
pandas.DataFrame.merge应该做你想做的事:
data = [
{'year': 1987, 'x': 1.5116, 'y': 0},
{'year': 1988, 'x': 1.135, 'y': 1}
]
means = df.groupby(['year'])['x', 'y'].mean()
df.merge(right=means, left_on='year', right_index=True, suffixes=('', 'mean'))
返回:
x y year xmean ymean
0 1.5116 0 1987 1.5116 0
1 1.1350 1 1988 1.1350 1