我正在学习大熊猫,我想为我的数据创建一个新的列(我正在使用国家名称数据)。
我正在使用1880年和1881年。
name sex births year
0 Mary F 7065 1880
1 Anna F 2604 1880
2 Emma F 2003 1880
3 Elizabeth F 1939 1880
4 Worthy M 5 1880
5 Wright M 5 1880
6 York M 5 1880
7 Zachariah M 5 1880
8 Mary F 6919 1881
9 Anna F 2698 1881
10 Emma F 2034 1881
11 Elizabeth F 1852 1881
12 Wilton M 5 1881
13 Wing M 5 1881
14 Wood M 5 1881
15 Wright M 5 1881
我正在创建总分娩数据:
total_births = names.pivot_table('births', index='year', columns='sex', aggfunc=sum)
给出:
sex F M
year
1880 13611 20
1881 13503 20
现在,我想在数据中创建另一列,我将每年的出生率与每年的总出生率进行比较。
例如:
name sex births year ratio
Mary F 7065 1880 7065/13611
Wilton M 5 1881 5/13503
我在尝试:
new = (names.groupby(['year', 'sex'])).assign(ratio= (names.groupby(['year','sex'])).names['births'] / total_births )
给出:
AttributeError: Cannot access callable attribute 'assign' of 'DataFrameGroupBy' objects, try using the 'apply' method
OR
我试图打破:
ratio = names.groupby(['year','sex'])
ratio1 = ratio.loc[:,'births']
但它给出了:
AttributeError: Cannot access callable attribute 'loc' of 'DataFrameGroupBy' objects, try using the 'apply' method
答案 0 :(得分:3)
我认为您需要groupby
transform
sum
,然后除以div
:
rat = names.groupby(['year','sex'])['births'].transform('sum')
print (rat)
0 13611
1 13611
2 13611
3 13611
4 20
5 20
6 20
7 20
8 13503
9 13503
10 13503
11 13503
12 20
13 20
14 20
15 20
Name: births, dtype: int64
names['ratio'] = names.births.div(rat)
print (names)
name sex births year ratio
0 Mary F 7065 1880 0.519065
1 Anna F 2604 1880 0.191316
2 Emma F 2003 1880 0.147160
3 Elizabeth F 1939 1880 0.142458
4 Worthy M 5 1880 0.250000
5 Wright M 5 1880 0.250000
6 York M 5 1880 0.250000
7 Zachariah M 5 1880 0.250000
8 Mary F 6919 1881 0.512405
9 Anna F 2698 1881 0.199807
10 Emma F 2034 1881 0.150633
11 Elizabeth F 1852 1881 0.137155
12 Wilton M 5 1881 0.250000
13 Wing M 5 1881 0.250000
14 Wood M 5 1881 0.250000
15 Wright M 5 1881 0.250000
assign
的解决方案:
names = names.assign(ratio=lambda x: x.births.div(rat))
print (names)
name sex births year ratio
0 Mary F 7065 1880 0.519065
1 Anna F 2604 1880 0.191316
2 Emma F 2003 1880 0.147160
3 Elizabeth F 1939 1880 0.142458
4 Worthy M 5 1880 0.250000
5 Wright M 5 1880 0.250000
6 York M 5 1880 0.250000
7 Zachariah M 5 1880 0.250000
8 Mary F 6919 1881 0.512405
9 Anna F 2698 1881 0.199807
10 Emma F 2034 1881 0.150633
11 Elizabeth F 1852 1881 0.137155
12 Wilton M 5 1881 0.250000
13 Wing M 5 1881 0.250000
14 Wood M 5 1881 0.250000
15 Wright M 5 1881 0.250000