我正在寻找一种在数据帧中计算两个新列的方法。我有两个变量性别和收入,我需要两个新列female_average_income和male_average_income为同一个df。我通过分组和聚合找到了几种方法,但这不是我需要的。在同一个df中只有两个普通列。有人可以帮忙吗?
答案 0 :(得分:1)
一种简单的方法就是使用两个loc
来调用并过滤性别:
In [390]:
df = pd.DataFrame({'gender':['m','f','m','m','f'], 'income':np.random.randn(5)})
df
Out[390]:
gender income
0 m -0.960345
1 f 0.876803
2 m -0.328706
3 m -0.826363
4 f 0.763037
In [391]:
df.loc[df.gender=='f', 'female_avg_income'], df.loc[df.gender=='m','male_avg_income'] = df.loc[df.gender=='f']['income'].mean(), df.loc[df.gender=='m']['income'].mean()
df
Out[391]:
gender income female_avg_income male_avg_income
0 m -0.960345 NaN -0.705138
1 f 0.876803 0.81992 NaN
2 m -0.328706 NaN -0.705138
3 m -0.826363 NaN -0.705138
4 f 0.763037 0.81992 NaN
更好的方法是在groupby对象上使用transform
,这将对齐返回的数据:
In [392]:
df.loc[df.gender=='f', 'female_avg_income'], df.loc[df.gender=='m','male_avg_income'] = df.groupby('gender')['income'].transform(np.mean), df.groupby('gender')['income'].transform(np.mean)
df
Out[392]:
gender income female_avg_income male_avg_income
0 m -0.960345 NaN -0.705138
1 f 0.876803 0.81992 NaN
2 m -0.328706 NaN -0.705138
3 m -0.826363 NaN -0.705138
4 f 0.763037 0.81992 NaN