Question

我有一个如下的df：

我正在使用以下简单代码：过滤 df 中的列，然后根据该列的值计算简单的数学运算，因此，如果列值被取消，处理并完成；我想计算整个df或所有行中被取消的行的百分比或数量。

df看起来像：

   ID |    Status    |   Color
   555    Cancelled     Green
   434    Processed     Red   
   212    Cancelled     Blue
   121    Cancelled     Green
   242    Cancelled     Blue
   352    Processed     Green
   343    Processed     Blue

我当前使用的代码是：

df[df['Color'] == 'Green']

df[(df['Status']=='Cancelled') & (df['Color']=='Green')]

对于每种不同类型的颜色，我首先手动过滤df以获得行数，然后在下面对其进行两次过滤以取消行数或取消订单数，然后手动除以＃，但将其除以绿色行数。

如果我想创建一个函数，可以在其中插入颜色名称和状态，然后在一个简单函数中进行数学运算，那是最好的方法？

预期的输出将类似于：

 Status      Green
Cancelled    0.666667
Processed    0.333333
dtype: float64

非常感谢！

Answer 1

您可以使用groupby和len（）：

df.groupby(by='Status').apply(lambda x: len(x)/len(df))

Status
Cancelled    0.666667
Processed    0.333333
dtype: float64

按状态和颜色分类：

cc = df.groupby(by='Color').ID.count()
df.groupby(by=['Color', 'Status']).apply(lambda x: len(x)/cc.loc[x.Color.iloc[0]])

Color  Status   
Blue   Cancelled    0.666667
       Processed    0.333333
Green  Cancelled    0.666667
       Processed    0.333333
Red    Processed    1.000000
dtype: float64

创建函数来过滤并基于过滤器计算行划分？

1 个答案: