有什么办法可以解决与熊猫聚合的多重条件分组吗?

时间:2020-05-17 10:46:01

标签: python python-3.x pandas indexing pandas-groupby

在很多情况下,我都对这个复杂的分组感到困惑,如果您能提供帮助,我将不胜感激?

输入数据框: Input Dataframe

我需要这种输出:

OUT put

现在,我需要根据每个unique_ID进行分组,并且在获取IF_Car_with_History == 1的地方,我需要每行的值总和和原始值的平均值

现在,我正在尝试用这段代码来解决这个问题,这很耗时间:

import pandas as pd

data = [[1,120789,"2012-07-03",0,1000,500]\
    , [1,232101,"2015-05-06",1,2300,700]\
    , [1,329911,"2016-05-19",1,4000,1000]\
    ,[2,129088,"2011-01-01",0,1200,400]\
    , [2,876541,"2013-03-01",1,1000,600]\
    , [2,864347,"2014-05-03",0,3000,1000]\
    , [2,987659,"2015-01-01",1,3200,700]] 

df = pd.DataFrame(data,columns =["Unique_ID","Transaction_ID","Date","IF_Car_with_History","Value","Amount"])
for i in data.Unique_ID.unique():
    df=data[data['Unique_ID']==i].reset_index(drop=True)
    idx=df[df['IF_Car_with_History']==1].reset_index()['index'].tolist()
    for s in idx:
        tmp=pd.DataFrame()
        hpa = df.iloc[s]["Transaction_ID"]
        tmp=df.iloc[:s]
        T_no = tmp["Transaction_ID"].iloc[-1]

        # print(tmp.columns)
        tmp=tmp.groupby(['Unique_ID'],as_index=False)\
         .agg(Value= ('Value','sum')\
               ,Amount= ('Amount','mean')).reset_index(drop=True)         
        # print(tm2)

        tmp["T_no"] = 0        
        tmp["T_no"][0] = T_no
        tmp["HPA"] = 0 
        tmp['HPA'][0]=hpa
        test_df = test_df.append(tmp)

此代码段需要很长时间。有什么更好的解决方案吗?

1 个答案:

答案 0 :(得分:1)

您可以执行以下操作,将Unique_IDIF_Car_with_History分组,然后找到sum的{​​{1}}和mean,然后合并{ {1}}:

Value

输出

Transaction_ID