Question

我正在从参与事故的驾驶员中计算出每个年龄段的百分比以及累计百分比（例如，直到39岁的驾驶员参与了所有事故的50％）

下面的代码有效，但是我敢肯定，这样做有更简洁/有效/清晰的方法。

df = pd.DataFrame({'Age group': ['20-29','30-39','40-49','50-59','60 and up'], 
                   'Number accidents': [10000, 8000, 6000, 3000, 1000]})
num_accidents = sum(df['Number accidents'])
df['% accidents'] = df['Number accidents'] / num_accidents * 100
per_acc = 0
for i in df.index:
    per_acc += df.loc[i,'% accidents']
    df.loc[i,'% accidents accumulated'] = per_acc
df

以下是上面代码的输出。

   Age group  Number accidents  % accidents  % accidents accumulated
0      20-29             10000    35.714286                35.714286
1      30-39              8000    28.571429                64.285714
2      40-49              6000    21.428571                85.714286
3      50-59              3000    10.714286                96.428571
4  60 and up              1000     3.571429               100.000000

请提供更好的编写方式的帮助

Answer 1

您可以使用cumsum，请参阅https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.Series.cumsum.html。

然后这将解决问题：

df['% accidents accumulated'] = df['% accidents'].cumsum()

Answer 2

到目前为止，最有效，最清晰的方法是：

df = pd.DataFrame({'Age group': ['20-29','30-39','40-49','50-59','60 and up'], 
                   'Number accidents': [10000, 8000, 6000, 3000, 1000]})
df['% accidents'] = df['Number accidents'] / df['Number accidents'].sum() * 100
df['% accidents accumulated'] = df['% accidents'].cumsum()

感谢大家的帮助！很想知道是否还有更好的方法。

如何用熊猫计算百分比和累积百分比

2 个答案: