Question

Category | Unit | ID | Time | isReq
  1         A     x1    t1      0 
  1         A     x1    t2      0 
  1         A     x1    t3      0 
  1         A     x1    t4      0
  1         B     x2    t5      1 
  1         B     x2    t6      0 
  1         B     x2    t7      0

我正在尝试查找每个类别和单位的ID的唯一数量，以及它们的平均持续时间。最终结果应如下所示：-

Category | Unit | ID_count |            time:diff
   1        A        1         mean_time_to_completion for all ids in category and unit

在给定的ID和Category下可以有多个Unit。我该如何处理获得结果所需的数据？我可以独立地获得按ID和Category分组的唯一Unit，但是我很难在一个查询中获得所有字段。

编辑：-我设法通过以下方式执行必要的操作：-

 df1.groupby(['Category','Unit','ID'])['Time'].agg(['first','last']).diff(axis = 1).iloc[:,-1].reset_index().groupby(['Category','Unit']).agg({'ID' : 'count','last' : pd.Series.mean})

我现在正在尝试在与上述相同的查询中，将每个ID的贡献计算为总isReq的百分比。任何有帮助的建议都将受到欢迎

Answer 1

您可以尝试-

df.groupby(['Category', 'Unit']).agg(ID_count=('ID','nunique'), time_diff=('Time', 'mean'))

groupby-汇总-groupby

1 个答案: