在很多情况下,我都对这个复杂的分组感到困惑,如果您能提供帮助,我将不胜感激?
我需要这种输出:
现在,我需要根据每个unique_ID进行分组,并且在获取IF_Car_with_History == 1的地方,我需要每行的值总和和原始值的平均值
现在,我正在尝试用这段代码来解决这个问题,这很耗时间:
import pandas as pd
data = [[1,120789,"2012-07-03",0,1000,500]\
, [1,232101,"2015-05-06",1,2300,700]\
, [1,329911,"2016-05-19",1,4000,1000]\
,[2,129088,"2011-01-01",0,1200,400]\
, [2,876541,"2013-03-01",1,1000,600]\
, [2,864347,"2014-05-03",0,3000,1000]\
, [2,987659,"2015-01-01",1,3200,700]]
df = pd.DataFrame(data,columns =["Unique_ID","Transaction_ID","Date","IF_Car_with_History","Value","Amount"])
for i in data.Unique_ID.unique():
df=data[data['Unique_ID']==i].reset_index(drop=True)
idx=df[df['IF_Car_with_History']==1].reset_index()['index'].tolist()
for s in idx:
tmp=pd.DataFrame()
hpa = df.iloc[s]["Transaction_ID"]
tmp=df.iloc[:s]
T_no = tmp["Transaction_ID"].iloc[-1]
# print(tmp.columns)
tmp=tmp.groupby(['Unique_ID'],as_index=False)\
.agg(Value= ('Value','sum')\
,Amount= ('Amount','mean')).reset_index(drop=True)
# print(tm2)
tmp["T_no"] = 0
tmp["T_no"][0] = T_no
tmp["HPA"] = 0
tmp['HPA'][0]=hpa
test_df = test_df.append(tmp)
此代码段需要很长时间。有什么更好的解决方案吗?
答案 0 :(得分:1)
您可以执行以下操作,将Unique_ID
和IF_Car_with_History
分组,然后找到sum
的{{1}}和mean
,然后合并{ {1}}:
Value
输出:
Transaction_ID