如何计算熊猫数据框中每组的行数并将其添加到原始数据

时间:2020-10-07 17:07:54

标签: python pandas numpy pandas-groupby aggregate-functions

我有如下所示的数据框df

ID   COMMODITY_CODE   DELIVERY_TYPE  DAY   Window_start  case_qty     deliveries. 
6042.0      SCGR        Live         1.0    15:00                 15756.75    7.75
6042.0      SCGR        Live         1.0    18:00                 15787.75    5.75
6042.0      SCGR        Live         1.0    21:00                 10989.75    4.75
6042.0      SCGR        Live         2.0    15:00                 21025.25    9.00
6042.0      SCGR        Live         2.0    18:00                 16041.75    5.75

我要在输出下面,在这里我按ID,COM​​MODITY_CODE,DELIVERY_TYPE,DAY分组并像下面那样计算window_count

ID   COMMODITY_CODE  DELIVERY_TYPE  DAY   Window_start  window_count   case_qty   deliveries
6042.0      SCGR        Live         1.0    15:00          3             15756.75     7.75
6042.0      SCGR        Live         1.0    18:00          3            15787.75      5.75
6042.0      SCGR        Live         1.0    21:00          3            10989.75      4.75
6042.0      SCGR        Live         2.0    15:00          2             21025.25     9.00
6042.0      SCGR        Live         2.0    18:00          2             16041.75     5.75      

我尝试了agg的以下代码。

df = df.groupby(['ID','CHAMBER_TYPE','COMMODITY_CODE','DELIVERY_TYPE','DAY'],as_index=False)\
                     .agg(window_count=("DAY", "count"))

即使它计算每个ID,COM​​MODITY_CODE,DELIVERY_TYPE,DAY组的窗口数,它也会删除较旧的列,例如Window_start,case_qty,交货

即我得到了低于期望的输出

ID   COMMODITY_CODE  DELIVERY_TYPE  DAY   window_count 
6042.0      SCGR        Live         1.0               3             
6042.0      SCGR        Live         1.0               3            
6042.0      SCGR        Live         1.0               3            
6042.0      SCGR        Live         2.0               2             
6042.0      SCGR        Live         2.0               2               

1 个答案:

答案 0 :(得分:0)

您正在寻找transform

df['window_count'] = df.groupby(['ID','CHAMBER_TYPE','COMMODITY_CODE','DELIVERY_TYPE','DAY'])['ID'].transform('size')

顺便说一下,示例数据中没有'CHAMBER_TYPE'列。