Question

我的数据框格为People，Days和Types。数据并不真正有意义，它只是一个例子。

我希望先在People然后Type进行分组，然后找到小于或等于3的天数百分比。

为了做到这一点，我创建了Boolean列等于或不到3天。然后应用count和sum聚合。我不是这种方法的忠实粉丝，因为count列实际上只需要Days而sum列只需要Under Day Limit。此方法实际上创建了两个不必要的列，并创建了许多额外的步骤。如何清理此代码，以便在更大的数据集上更有效地运行。

import pandas as pd
# create dataframe
df = pd.DataFrame(data=[['A', 4, 'Type 1'],
                        ['A', 1, 'Type 1'],
                        ['A', 3, 'Type 2'],
                        ['A', 0, 'Type 1'],
                        ['A', 12, 'Type 2'],
                        ['B', 1, 'Type 1'],
                        ['B', 3, 'Type 1'],
                        ['B', 5, 'Type 2']],
                   columns=['Person', 'Days', 'Type'])

df['Under Day Limit'] = df['Days'] <= 3;
print df
df = df.groupby(['Person', 'Type']).agg(['count', 'sum'])

df['Percent under Day Limit'] = df['Under Day Limit']['sum'] / df['Days']['count']
print df

输出继电器：

               Days     Under Day Limit     Percent under Day Limit
              count sum           count sum                        
Person Type                                                        
A      Type 1     3   5               3   2                0.666667
       Type 2     2  15               2   1                0.500000
B      Type 1     2   4               2   2                1.000000
       Type 2     1   5               1   0                0.000000

Answer 1

set_index

Person

Type
布尔系列Days＆gt; = 3
groupby级别
value_counts(normalize=True)

df.set_index(['Person', 'Type']).Days.ge(3).groupby(level=[0, 1]).value_counts(True)

Person  Type    Days 
A       Type 1  False    0.666667
                True     0.333333
        Type 2  True     1.000000
B       Type 1  False    0.500000
                True     0.500000
        Type 2  True     1.000000
Name: Days, dtype: float64

进行更多格式化

df.set_index(['Person', 'Type']).Days.rename('>= 3').ge(3)  \
  .groupby(level=[0, 1]).value_counts(True).unstack(fill_value=0)

获得满足Pandas组内条件的值的百分比

1 个答案: