Python Pandas DF计算新专栏(意为群组)

时间:2015-02-20 10:20:39

标签: python pandas dataframe

我正在寻找一种在数据帧中计算两个新列的方法。我有两个变量性别和收入,我需要两个新列female_average_income和male_average_income为同一个df。我通过分组和聚合找到了几种方法,但这不是我需要的。在同一个df中只有两个普通列。有人可以帮忙吗?

1 个答案:

答案 0 :(得分:1)

一种简单的方法就是使用两个loc来调用并过滤性别:

In [390]:

df = pd.DataFrame({'gender':['m','f','m','m','f'], 'income':np.random.randn(5)})
df
Out[390]:
  gender    income
0      m -0.960345
1      f  0.876803
2      m -0.328706
3      m -0.826363
4      f  0.763037
In [391]:

df.loc[df.gender=='f', 'female_avg_income'], df.loc[df.gender=='m','male_avg_income'] = df.loc[df.gender=='f']['income'].mean(), df.loc[df.gender=='m']['income'].mean()
df
Out[391]:
  gender    income  female_avg_income  male_avg_income
0      m -0.960345                NaN        -0.705138
1      f  0.876803            0.81992              NaN
2      m -0.328706                NaN        -0.705138
3      m -0.826363                NaN        -0.705138
4      f  0.763037            0.81992              NaN

更好的方法是在groupby对象上使用transform,这将对齐返回的数据:

In [392]:

df.loc[df.gender=='f', 'female_avg_income'], df.loc[df.gender=='m','male_avg_income'] = df.groupby('gender')['income'].transform(np.mean), df.groupby('gender')['income'].transform(np.mean)
df
Out[392]:
  gender    income  female_avg_income  male_avg_income
0      m -0.960345                NaN        -0.705138
1      f  0.876803            0.81992              NaN
2      m -0.328706                NaN        -0.705138
3      m -0.826363                NaN        -0.705138
4      f  0.763037            0.81992              NaN