我的数据在以下数据框中
df = pd.DataFrame({'AccID':['001','001','001','002','002','003'],
'AccTypes':['A','B','C','A','B','C'],
'Status':['Closed','Active','Active','Active','Closed','Active'],
'Years':[5,15,10,20,25,30]})
AccID AccTypes Status Years
001 A Closed 5
001 B Active 15
001 C Active 10
002 A Active 20
002 B Closed 25
003 C Active 30
我想创建另一个名为“ActiveYears”的列,对于给定的活动的AccID,无论AccTypes ,每个值都是最大活跃年数。预期的输出如下:
AccID AccTypes Status Years ActiveYears Explanations
001 A Closed 5 5 # Status = Closed, we set ActiveYears = Years
001 B Active 15 15 # Status = Active, we select the maximum year of AccID = 001 with active status
001 C Active 10 15 # Status = Active, we select the maximum year of AccID = 001 with active status
002 A Active 20 20 # Status = Active, we select the maximum year of AccID = 002 with active status
002 B Closed 25 20 # Status = Closed, we set ActiveYears = Years
003 C Active 30 30 # Status = Active, we select the maximum year of AccID = 003 with active status
我可以通过循环来做到这一点,但它不够优雅。我可以知道如何以比循环更好的方式做到这一点吗?谢谢。
答案 0 :(得分:0)
您可以使用以下内容:
首先处理状态Closed
:
df.loc[df.Status == 'Closed','ActiveYears'] = df.loc[df.Status == 'Closed','Years']
使用groupby transformation处理有效,
df.loc[df.Status == 'Active', 'ActiveYears'] = df[df.Status == 'Active'].groupby('AccID')['Years'].transform(max)
print(df)
AccID AccTypes Status Years ActiveYears
0 001 A Closed 5 5.0
1 001 B Active 15 15.0
2 001 C Active 10 15.0
3 002 A Active 20 20.0
4 002 B Closed 25 25.0
5 003 C Active 30 30.0