大熊猫计算子组的比例

时间:2018-06-07 14:18:30

标签: python pandas

这是一个pandas.core.series.Series

rate         a.status             
(0.0, 0.05]  20          4                
(0.05, 0.1]  20          7
             21          2
             11          1
(0.1, 0.15]  20          2
             21          1
(0.15, 0.2]  20          2
             21          1
(0.3, 0.35]  20          2
(0.35, 0.4]  20          2
(0.45, 0.5]  20          2
(0.55, 0.6]  20          1
(0.6, 0.65]  20          1 

我想计算每个比率的状态20的比例, 结果应该是

rate               proportion
(0.0, 0.05]          1   <----    4/4
(0.05, 0.1]          0.7 <----  7/(7+2+1)

(0.1, 0.15]          0.66 <----  2/(2+1)

(0.15, 0.2]          0.66  <----  2/(2+1)

(0.3, 0.35]         1
(0.35, 0.4]         1
(0.45, 0.5]         1
(0.55, 0.6]         1
(0.6, 0.65]         1 

我尝试了[a [&#39; a.status&#39;] == 20] .count(),但它不起作用。 我该怎么办?

1 个答案:

答案 0 :(得分:0)

您的测试样本的再现:

text = """rate         a.status             
(0.0,0.05]  20          4                
(0.05,0.1]  20          7
(0.05,0.1]  21          2
(0.05,0.1]  11          1
(0.1,0.15]  20          2
(0.1,0.15]  21          1
(0.15,0.2]  20          2
(0.15,0.2]  21          1
(0.3,0.35]  20          2
(0.35,0.4]  20          2
(0.45,0.5]  20          2
(0.55,0.6]  20          1
(0.6,0.65]  20          1 """

d = pd.read_csv(io.StringIO(text), sep="\s+").reset_index()

执行计算的步骤:

1)groupby index

2)选择需要rate

的数据

3)除以组内的总和

代码:

rate = 20

d.groupby("index").apply(lambda x:
    x.loc[x["rate"] == rate, "a.status"] / x["a.status"].sum())

结果:

index         
(0.0,0.05]  0     1.000000
(0.05,0.1]  1     0.700000
(0.1,0.15]  4     0.666667
(0.15,0.2]  6     0.666667
(0.3,0.35]  8     1.000000
(0.35,0.4]  9     1.000000
(0.45,0.5]  10    1.000000
(0.55,0.6]  11    1.000000
(0.6,0.65]  12    1.000000