Question

我有下表，我想获得 >= 10 秒或更长时间的每种类型的百分比。什么是有效的模块化代码？我通常只会过滤每种类型然后进行除法，但想知道是否有更好的方法来计算 >= 10 秒或更长的类型列中每个值的百分比。

谢谢

   Type | Seconds
     A       23
     V       10
     V       10
     A       7
     B       1
     V       10
     B       72
     A       11
     V       19
     V        3



expected output:

    type   %
     A    .67
     V    .80
     B    .50

Answer 1

一个稍微更有效的选择是创建一个 Seconds.ge(10) 的布尔掩码并在掩码上使用 groupby.mean()：

df.Seconds.ge(10).groupby(df.Type).mean().reset_index(name='%')

#    Type         %
# 0     A  0.666667
# 1     B  0.500000
# 2     V  0.800000

鉴于这些功能：

mask_groupby_mean = lambda df: df.Seconds.ge(10).groupby(df.Type).mean().reset_index(name='%')
groupby_apply = lambda df: df.groupby('Type').Seconds.apply(lambda x: (x.ge(10).sum() / len(x)) * 100).reset_index(name='%')
set_index_mean = lambda df: df.set_index('Type').ge(10).mean(level=0).rename(columns={'Seconds': '%'}).reset_index()

Answer 2

您可以使用.groupby：

x = (
    df.groupby("Type")["Seconds"]
    .apply(lambda x: (x.ge(10).sum() / len(x)) * 100)
    .reset_index(name="%")
)

print(x)

打印：

  Type          %
0    A  66.666667
1    B  50.000000
2    V  80.000000

Answer 3

另一个选项 set_index + ge 然后 mean 在 level=0 上：

new_df = (
    df.set_index('Type')['Seconds'].ge(10).mean(level=0)
        .round(2)
        .reset_index(name='%')
)

new_df：

  Type     %
0    A  0.67
1    V  0.80
2    B  0.50

根据条件或值计算列中值的百分比

3 个答案: