如何使用Pandas python中的分组列计算百分比?

时间:2016-05-08 02:08:16

标签: python pandas

熊猫新手,遇到一个我无法弄清楚的简单问题。

我在美国有一个婴儿名字数据集,如下所示:

orig data

我正在尝试编写一个程序,我可以在其中输入一个名单列表,并找回该名称适用于男性或女性的百分比(这一年与我的目的无关)。

我写了groupby,然后将男性和女性名字加在一起。

groupby data

现在我需要的是根据这些数据计算百分比。我认为它是某种transform(对吗?)但我似乎无法写任何有效的东西。我知道我将如何在SQL中完成它,但我真的想弄清楚Pandas。一些指示将非常感谢!

谢谢!

3 个答案:

答案 0 :(得分:1)

如果我理解了您正在寻找的内容,我会先用零填充缺失的值,即n.fillna(0)。然后计算百分比并将结果分配给新列。对于女性百分比:

n['%F'] = n[('Count', 'F')] / n['sum'] * 100

答案 1 :(得分:0)

甚至在你执行总和之前,你就会这样做:

n.apply(lambda x: x / x.sum(), axis=1)

答案 2 :(得分:0)

在列中看起来像Multiindex

print n.columns
MultiIndex(levels=[[u'Count', u'sum'], [u'', u'F', u'M']],
           labels=[[0, 0, 1], [1, 2, 0]],
           names=[None, u'Gender'])

首先按using-slicers选择列FM。 然后按0 sum除以idx = pd.IndexSlice F = n.loc[:, idx['Count','F']] M = n.loc[:, idx['Count','M']] sum = n.loc[:, idx['sum','']] n['%F'] = F.fillna(0)/sum * 100 n['%M'] = M.fillna(0)/sum * 100 print n Count sum %F %M Gender F M Name Aaban NaN 10.285710 10.285710 0.000000 100.000000 Aabfla 7.000000 NaN 7.000000 100.000000 0.000000 Aabid NaN 5.000000 5.000000 0.000000 100.000000 Aabrielle 5.000000 NaN 5.000000 100.000000 0.000000 Aadarn NaN 8.521739 8.521739 0.000000 100.000000 Aadan NaN 12.000000 12.000000 0.000000 100.000000 Aadar NaN 11.285710 11.285710 0.000000 100.000000 Aaden 5.000000 279.002857 284.002857 1.760546 98.239454 Aade NaN 5.000000 5.000000 0.000000 100.000000 Aadhav NaN 12.750000 12.750000 0.000000 100.000000 Aadhavan NaN 6.333333 6.333333 0.000000 100.000000 Aadhi NaN 6.000000 6.000000 0.000000 100.000000 Aadhira 0.888857 NaN 9.000007 9.876181 0.000000 Aadhve 79.875000 NaN 79.875000 100.000000 0.000000 Aadhven NaN 5.000000 5.000000 0.000000 100.000000 Aadi 5.333333 55.583333 60.910007 8.756087 91.254846 Aadian NaN 5.000000 5.000000 0.000000 100.000000 Aadil NaN 12.913003 12.913003 0.000000 100.000000 Aadin NaN 12.000000 12.000000 0.000000 100.000000 列:

  13 = 8 * 1 + 5
  8 = 5 * 1 + 3
  5 = 3 * 1 + 2
  3 = 2 * 1 + 1
  2 = 1 * 2 + 0
  gcd (13, 8) = 1
  5 Steps required
 There are only 2 different quotients: 1 and 2
 It is not a very interesting example.
 Instead, calculating the MCD (455, 355):
 455 = 355 * 1 + 100
 355 = 100 * 3 + 55
 100 = 55 * 1 + 45
 55 = 45 * 1 + 10
 45 = 10 * 4 + 5
 10 = 5 * 2 + 0
 gcd (455, 355) = 5
 6 steps (or lines or divisions) are required
 4 different quotients: 1, 3, 4, 2
 So, this case is more interesting than the last.