如何基于一列内爆(大熊猫反转爆炸)

时间:2020-10-06 23:54:43

标签: pandas numpy explode implode

我有一个如下所示的数据框df

  NETWORK       config_id       APPLICABLE_DAYS  Case    Delivery  
0   Grocery     5399            SUN               10       1        
1   Grocery     5399            MON               20       2       
2   Grocery     5399            TUE               30       3        
3   Grocery     5399            WED               40       4       

我想进行内爆(将多行的Applicable_days合并为如下所示的单行),并获取每个config_id的平均情况和交付情况

  NETWORK       config_id       APPLICABLE_DAYS      Avg_Cases    Avg_Delivery 
0   Grocery     5399            SUN,MON,TUE,WED         90           10

使用网络上的groupby,config_id可以获取如下的avg_cases和avg_delivery。

df.groupby(['network','config_id']).agg({'case':'mean','delivery':'mean'})

但是在执行此聚合时我如何能够加入APPLICABLE_DAYS?

2 个答案:

答案 0 :(得分:2)

如果要爆炸的“对立面”,则意味着将其放入解决方案1中的列表中。您也可以作为刺入解决方案2:

lambda x: x.tolist()用于groupby函数中的'APPLICABLE_DAYS'列:

.agg

df = (df.groupby(['NETWORK','config_id']) .agg({'APPLICABLE_DAYS': lambda x: x.tolist(),'Case':'mean','Delivery':'mean'}) .rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1) .reset_index()) df Out[1]: NETWORK config_id APPLICABLE_DAYS Avg_Cases Avg_Delivery 0 Grocery 5399 [SUN, MON, TUE, WED] 25 2.5 用于groupby函数中的lambda x: ",".join(x)列:

'APPLICABLE_DAYS'

如果您要查找.agg,则只需将 df = (df.groupby(['NETWORK','config_id']) .agg({'APPLICABLE_DAYS': lambda x: ",".join(x),'Case':'mean','Delivery':'mean'}) .rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1) .reset_index()) df Out[1]: NETWORK config_id APPLICABLE_DAYS Avg_Cases Avg_Delivery 0 Grocery 5399 SUN,MON,TUE,WED 25 2.5 sum列的mean更改为sum

答案 1 :(得分:1)

您的结果看起来更像是总和,而不是平均值;以下解决方案使用named aggregation

    df.groupby(["NETWORK", "config_id"]).agg(
    APPLICABLE_DAYS=("APPLICABLE_DAYS", ",".join),
    Total_Cases=("Case", "sum"),
    Total_Delivery=("Delivery", "sum"),
)

                        APPLICABLE_DAYS       Total_Cases   Total_Delivery
NETWORK config_id           
Grocery 5399                SUN,MON,TUE,WED           100      10

如果这是平均值,则可以将'​​sum'更改为'mean':

df.groupby(["NETWORK", "config_id"]).agg(
    APPLICABLE_DAYS=("APPLICABLE_DAYS", ",".join),
    Avg_Cases=("Case", "mean"),
    Avg_Delivery=("Delivery", "mean"),
)

                    APPLICABLE_DAYS   Avg_Cases Avg_Delivery
NETWORK config_id           
Grocery 5399         SUN,MON,TUE,WED      25      2.5