我有一个如下所示的数据框df
NETWORK config_id APPLICABLE_DAYS Case Delivery
0 Grocery 5399 SUN 10 1
1 Grocery 5399 MON 20 2
2 Grocery 5399 TUE 30 3
3 Grocery 5399 WED 40 4
我想进行内爆(将多行的Applicable_days合并为如下所示的单行),并获取每个config_id的平均情况和交付情况
NETWORK config_id APPLICABLE_DAYS Avg_Cases Avg_Delivery
0 Grocery 5399 SUN,MON,TUE,WED 90 10
使用网络上的groupby,config_id可以获取如下的avg_cases和avg_delivery。
df.groupby(['network','config_id']).agg({'case':'mean','delivery':'mean'})
但是在执行此聚合时我如何能够加入APPLICABLE_DAYS?
答案 0 :(得分:2)
如果要爆炸的“对立面”,则意味着将其放入解决方案1中的列表中。您也可以作为刺入解决方案2:
将lambda x: x.tolist()
用于groupby函数中的'APPLICABLE_DAYS'
列:
.agg
将df = (df.groupby(['NETWORK','config_id'])
.agg({'APPLICABLE_DAYS': lambda x: x.tolist(),'Case':'mean','Delivery':'mean'})
.rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1)
.reset_index())
df
Out[1]:
NETWORK config_id APPLICABLE_DAYS Avg_Cases Avg_Delivery
0 Grocery 5399 [SUN, MON, TUE, WED] 25 2.5
用于groupby函数中的lambda x: ",".join(x)
列:
'APPLICABLE_DAYS'
如果您要查找.agg
,则只需将 df = (df.groupby(['NETWORK','config_id'])
.agg({'APPLICABLE_DAYS': lambda x: ",".join(x),'Case':'mean','Delivery':'mean'})
.rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1)
.reset_index())
df
Out[1]:
NETWORK config_id APPLICABLE_DAYS Avg_Cases Avg_Delivery
0 Grocery 5399 SUN,MON,TUE,WED 25 2.5
和sum
列的mean
更改为sum
。
答案 1 :(得分:1)
您的结果看起来更像是总和,而不是平均值;以下解决方案使用named aggregation:
df.groupby(["NETWORK", "config_id"]).agg(
APPLICABLE_DAYS=("APPLICABLE_DAYS", ",".join),
Total_Cases=("Case", "sum"),
Total_Delivery=("Delivery", "sum"),
)
APPLICABLE_DAYS Total_Cases Total_Delivery
NETWORK config_id
Grocery 5399 SUN,MON,TUE,WED 100 10
如果这是平均值,则可以将'sum'更改为'mean':
df.groupby(["NETWORK", "config_id"]).agg(
APPLICABLE_DAYS=("APPLICABLE_DAYS", ",".join),
Avg_Cases=("Case", "mean"),
Avg_Delivery=("Delivery", "mean"),
)
APPLICABLE_DAYS Avg_Cases Avg_Delivery
NETWORK config_id
Grocery 5399 SUN,MON,TUE,WED 25 2.5