添加新列并计算比率

时间:2016-12-02 13:39:44

标签: python pandas

我正在学习大熊猫,我想为我的数据创建一个新的列(我正在使用国家名称数据)。

我正在使用1880年和1881年。

          name sex  births  year
0        Mary   F    7065  1880
1        Anna   F    2604  1880
2        Emma   F    2003  1880
3   Elizabeth   F    1939  1880
4      Worthy   M       5  1880
5      Wright   M       5  1880
6        York   M       5  1880
7   Zachariah   M       5  1880
8        Mary   F    6919  1881
9        Anna   F    2698  1881
10       Emma   F    2034  1881
11  Elizabeth   F    1852  1881
12     Wilton   M       5  1881
13       Wing   M       5  1881
14       Wood   M       5  1881
15     Wright   M       5  1881

我正在创建总分娩数据:

total_births = names.pivot_table('births', index='year', columns='sex', aggfunc=sum)

给出:

sex       F   M
year           
1880  13611  20
1881  13503  20

现在,我想在数据中创建另一列,我将每年的出生率与每年的总出生率进行比较。

例如:

name  sex births     year ratio
Mary   F   7065      1880  7065/13611
Wilton M     5       1881   5/13503

我在尝试:

new = (names.groupby(['year', 'sex'])).assign(ratio= (names.groupby(['year','sex'])).names['births'] / total_births )

给出:

AttributeError: Cannot access callable attribute 'assign' of 'DataFrameGroupBy' objects, try using the 'apply' method

OR

我试图打破:

ratio = names.groupby(['year','sex'])
ratio1 = ratio.loc[:,'births']

但它给出了:

AttributeError: Cannot access callable attribute 'loc' of 'DataFrameGroupBy' objects, try using the 'apply' method

1 个答案:

答案 0 :(得分:3)

我认为您需要groupby transform sum,然后除以div

rat = names.groupby(['year','sex'])['births'].transform('sum')
print (rat)
0     13611
1     13611
2     13611
3     13611
4        20
5        20
6        20
7        20
8     13503
9     13503
10    13503
11    13503
12       20
13       20
14       20
15       20
Name: births, dtype: int64
names['ratio'] = names.births.div(rat)
print (names)
         name sex  births  year     ratio
0        Mary   F    7065  1880  0.519065
1        Anna   F    2604  1880  0.191316
2        Emma   F    2003  1880  0.147160
3   Elizabeth   F    1939  1880  0.142458
4      Worthy   M       5  1880  0.250000
5      Wright   M       5  1880  0.250000
6        York   M       5  1880  0.250000
7   Zachariah   M       5  1880  0.250000
8        Mary   F    6919  1881  0.512405
9        Anna   F    2698  1881  0.199807
10       Emma   F    2034  1881  0.150633
11  Elizabeth   F    1852  1881  0.137155
12     Wilton   M       5  1881  0.250000
13       Wing   M       5  1881  0.250000
14       Wood   M       5  1881  0.250000
15     Wright   M       5  1881  0.250000

assign的解决方案:

names = names.assign(ratio=lambda x: x.births.div(rat))
print (names)
         name sex  births  year     ratio
0        Mary   F    7065  1880  0.519065
1        Anna   F    2604  1880  0.191316
2        Emma   F    2003  1880  0.147160
3   Elizabeth   F    1939  1880  0.142458
4      Worthy   M       5  1880  0.250000
5      Wright   M       5  1880  0.250000
6        York   M       5  1880  0.250000
7   Zachariah   M       5  1880  0.250000
8        Mary   F    6919  1881  0.512405
9        Anna   F    2698  1881  0.199807
10       Emma   F    2034  1881  0.150633
11  Elizabeth   F    1852  1881  0.137155
12     Wilton   M       5  1881  0.250000
13       Wing   M       5  1881  0.250000
14       Wood   M       5  1881  0.250000
15     Wright   M       5  1881  0.250000