熊猫分组后如何计算自定义聚合

时间:2020-10-07 15:37:00

标签: python pandas numpy pandas-groupby aggregate

我有如下所示的数据框df

ID   COMMODITY_CODE   DELIVERY_TYPE  DAY   Window_start_time     case_qty     deliveries
6042.0      SCGR        Live         1.0    15:00                 15756.75    7.75
6042.0      SCGR        Live         1.0    18:00                 15787.75    5.75
6042.0      SCGR        Live         1.0    21:00                 10989.75    4.75
6042.0      SCGR        Live         2.0    15:00                 21025.25    9.00
6042.0      SCGR        Live         2.0    18:00                 16041.75    5.75

我想在下面的输出中按ID,COM​​MODITY_CODE,DELIVERY_TYPE,DAY分组,并在case_qty_ratio和dlvry_ratio下方进行计算,如下所示

ID   COMMODITY_CODE   DELIVERY_TYPE  DAY  case_qty   deliveries dlvry_ratio case_qty_ratio
6042.0      SCGR        Live         1.0.  15756.75   7.75         0.42          0.37
6042.0      SCGR        Live         1.0.  15787.75   5.75.        0.31.         0.37
6042.0      SCGR        Live         1.0.  10989.75   4.75.        0.26.         0.25
6042.0      SCGR        Live         2.0.  21025.25   9.00.        0.61.         0.56
6042.0      SCGR        Live         2.0.  16041.75   5.75.        0.39          0.44

我尝试使用lambda函数在下面的代码中汇总这些信息

df.groupby(['ID','COMMODITY_CODE','DELIVERY_TYPE','DAY']  \
                        ,as_index=False) \
                        .agg( \
                             delivery_ratio=("deliveries",lambda x: x / x.sum()), \
                             case_ratio=(lambda x: x/ x.sum() ) / 

但这没用。任何帮助将不胜感激

2 个答案:

答案 0 :(得分:2)

改为尝试这种方式:

df[['case_ratio', 'delivery_ratio']] = df.groupby(['ID','COMMODITY_CODE','DELIVERY_TYPE','DAY'], 
                                                   as_index=False)[['case_qty', 'deliveries']]\
                                          .transform(lambda x: x/x.sum())

输出:

       ID COMMODITY_CODE DELIVERY_TYPE  DAY Window_start_time  case_qty  deliveries  case_ratio   delivery_ratio
0  6042.0           SCGR          Live  1.0             15:00  15756.75        7.75     0.370449        0.424658
1  6042.0           SCGR          Live  1.0             18:00  15787.75        5.75     0.371177        0.315068
2  6042.0           SCGR          Live  1.0             21:00  10989.75        4.75     0.258374        0.260274
3  6042.0           SCGR          Live  2.0             15:00  21025.25        9.00     0.567223        0.610169
4  6042.0           SCGR          Live  2.0             18:00  16041.75        5.75     0.432777        0.389831

答案 1 :(得分:1)

类似于Scott的答案,只是transform('sum')然后除以:

cols = ['case_qty', 'deliveries']
df = df.join(df[cols].div(df.groupby(['ID','COMMODITY_CODE','DELIVERY_TYPE','DAY'])
                            [cols].transform('sum')
                         )
                     .add_suffix('_ratio')
            )

输出:

       ID COMMODITY_CODE DELIVERY_TYPE  DAY Window_start_time  case_qty  \
0  6042.0           SCGR          Live  1.0             15:00  15756.75   
1  6042.0           SCGR          Live  1.0             18:00  15787.75   
2  6042.0           SCGR          Live  1.0             21:00  10989.75   
3  6042.0           SCGR          Live  2.0             15:00  21025.25   
4  6042.0           SCGR          Live  2.0             18:00  16041.75   

   deliveries  case_qty_ratio  deliveries_ratio  
0        7.75        0.370449          0.424658  
1        5.75        0.371177          0.315068  
2        4.75        0.258374          0.260274  
3        9.00        0.567223          0.610169  
4        5.75        0.432777          0.389831