这是一个pandas.core.series.Series
rate a.status
(0.0, 0.05] 20 4
(0.05, 0.1] 20 7
21 2
11 1
(0.1, 0.15] 20 2
21 1
(0.15, 0.2] 20 2
21 1
(0.3, 0.35] 20 2
(0.35, 0.4] 20 2
(0.45, 0.5] 20 2
(0.55, 0.6] 20 1
(0.6, 0.65] 20 1
我想计算每个比率的状态20的比例, 结果应该是
rate proportion
(0.0, 0.05] 1 <---- 4/4
(0.05, 0.1] 0.7 <---- 7/(7+2+1)
(0.1, 0.15] 0.66 <---- 2/(2+1)
(0.15, 0.2] 0.66 <---- 2/(2+1)
(0.3, 0.35] 1
(0.35, 0.4] 1
(0.45, 0.5] 1
(0.55, 0.6] 1
(0.6, 0.65] 1
我尝试了[a [&#39; a.status&#39;] == 20] .count(),但它不起作用。 我该怎么办?
答案 0 :(得分:0)
您的测试样本的再现:
text = """rate a.status
(0.0,0.05] 20 4
(0.05,0.1] 20 7
(0.05,0.1] 21 2
(0.05,0.1] 11 1
(0.1,0.15] 20 2
(0.1,0.15] 21 1
(0.15,0.2] 20 2
(0.15,0.2] 21 1
(0.3,0.35] 20 2
(0.35,0.4] 20 2
(0.45,0.5] 20 2
(0.55,0.6] 20 1
(0.6,0.65] 20 1 """
d = pd.read_csv(io.StringIO(text), sep="\s+").reset_index()
执行计算的步骤:
1)groupby index
2)选择需要rate
3)除以组内的总和
代码:
rate = 20
d.groupby("index").apply(lambda x:
x.loc[x["rate"] == rate, "a.status"] / x["a.status"].sum())
结果:
index
(0.0,0.05] 0 1.000000
(0.05,0.1] 1 0.700000
(0.1,0.15] 4 0.666667
(0.15,0.2] 6 0.666667
(0.3,0.35] 8 1.000000
(0.35,0.4] 9 1.000000
(0.45,0.5] 10 1.000000
(0.55,0.6] 11 1.000000
(0.6,0.65] 12 1.000000