我有如下所示的数据框df
ID COMMODITY_CODE DELIVERY_TYPE DAY Window_start case_qty deliveries.
6042.0 SCGR Live 1.0 15:00 15756.75 7.75
6042.0 SCGR Live 1.0 18:00 15787.75 5.75
6042.0 SCGR Live 1.0 21:00 10989.75 4.75
6042.0 SCGR Live 2.0 15:00 21025.25 9.00
6042.0 SCGR Live 2.0 18:00 16041.75 5.75
我要在输出下面,在这里我按ID,COMMODITY_CODE,DELIVERY_TYPE,DAY分组并像下面那样计算window_count
ID COMMODITY_CODE DELIVERY_TYPE DAY Window_start window_count case_qty deliveries
6042.0 SCGR Live 1.0 15:00 3 15756.75 7.75
6042.0 SCGR Live 1.0 18:00 3 15787.75 5.75
6042.0 SCGR Live 1.0 21:00 3 10989.75 4.75
6042.0 SCGR Live 2.0 15:00 2 21025.25 9.00
6042.0 SCGR Live 2.0 18:00 2 16041.75 5.75
我尝试了agg的以下代码。
df = df.groupby(['ID','CHAMBER_TYPE','COMMODITY_CODE','DELIVERY_TYPE','DAY'],as_index=False)\
.agg(window_count=("DAY", "count"))
即使它计算每个ID,COMMODITY_CODE,DELIVERY_TYPE,DAY组的窗口数,它也会删除较旧的列,例如Window_start,case_qty,交货
即我得到了低于期望的输出
ID COMMODITY_CODE DELIVERY_TYPE DAY window_count
6042.0 SCGR Live 1.0 3
6042.0 SCGR Live 1.0 3
6042.0 SCGR Live 1.0 3
6042.0 SCGR Live 2.0 2
6042.0 SCGR Live 2.0 2
答案 0 :(得分:0)
您正在寻找transform
:
df['window_count'] = df.groupby(['ID','CHAMBER_TYPE','COMMODITY_CODE','DELIVERY_TYPE','DAY'])['ID'].transform('size')
顺便说一下,示例数据中没有'CHAMBER_TYPE'
列。