Question

我有一个看起来像这样的数据框...

 ID Acuity TOTAL_ED_LOS
 1    2      423
 2    5      52
 3    5      535
 4    1      87
 ...

我想产生一个看起来像这样的表：

 Acuity    Count   Median Percentile_25   Percentile_75   % of total
   1         234 ...                                         31%                                         
   2         65 ...                                           8%
   3         56 ...                                           7%
   4         345 ...                                          47%
   5         35  ...                                          5%

我已经有代码可以提供我所需的一切，除了占总列的百分比

def percentile(n):
    def percentile_(x):
        return np.percentile(x, n)
    percentile_.__name__ = 'percentile_%s' % n
    return percentile_

df_grp = df_merged_v1.groupby(['Acuity'])
df_grp['TOTAL_ED_LOS'].agg(['count','median', 
                                  percentile(25), percentile(75)]).reset_index()

是否有一种有效的方法可以添加总列的百分比？下面的链接包含有关如何获取总数百分比的代码，但是我不确定如何将其应用于我的代码。我知道我可以创建两个表然后合并它们，但是我好奇是否有更清洁的方法。

How to calculate count and percentage in groupby in Python

Answer 1

这是使用内置的一些熊猫工具的一种方法：

# Set random number seeed and create a dummy datafame with two columns
np.random.seed(123)
df = pd.DataFrame({'activity':np.random.choice([*'ABCDE'], 40), 
                   'TOTAL_ED_LDS':np.random.randint(50, 500, 40)})

# Reshape dataframe to get activit per column 
# then use the output from describe and transpose
df_out = df.set_index([df.groupby('activity').cumcount(),'activity'])['TOTAL_ED_LDS']\
           .unstack().describe().T

#Calculate percent count of total count
df_out['% of Total'] = df_out['count'] / df_out['count'].sum() * 100.
df_out

输出：

          count        mean         std    min     25%    50%     75%    max  % of Total
activity                                                                                
A           8.0  213.125000  106.810162   93.0  159.50  200.0  231.75  421.0        20.0
B          10.0  308.200000  116.105125   68.0  240.75  324.5  376.25  461.0        25.0
C           6.0  277.666667  117.188168  114.0  193.25  311.5  352.50  409.0        15.0
D           7.0  370.285714  124.724649  120.0  337.50  407.0  456.00  478.0        17.5
E           9.0  297.000000  160.812002   51.0  233.00  294.0  415.00  488.0        22.5

按中位数，百分比和总计百分比分组

1 个答案: