在一列上有条件的pandas groupby可以填充另一列

时间:2018-12-16 19:25:26

标签: python pandas

我有一个包含学生地址的以下数据框df_address

student_id     address_type     Address          City      
 1                R              6th street      MPLS              
 1                P              10th street SE  Chicago           
 1                E              10th street SE  Chicago           
 2                P              Washington ST   Boston            
 2                E              Essex St        NYC               
 3                E              1040 Taft Blvd  Dallas            
 4                R              24th street      NYC             
 4                P              8th street SE  Chicago           
 5                T              10 Riverside Ave Boston
 6                                20th St          NYC          

每个学生可以有多种地址类型:

R代表“住宅”,P代表“永久”,E代表“紧急”,T代表“临时”,addr_type也可以为空白

我要基于以下逻辑填充“ IsPrimaryAddress”列:

如果对于特定学生,如果存在address_type R,则应输入“是”  在IsPrimaryAddress列中的address_type“ R”前面 并且“ no”应该写在该特定student_id的其他地址类型前面。

如果address_type R不存在但P存在,则IsPrimaryAddress ='Yes'为'P'和'No' 其余类型

如果P或R都不存在,但E存在,则IsPrimaryAddress ='Yes'为'E' 如果P,R或E不存在,但'T'存在,则IsPrimaryAddress ='Yes'为'T' 结果数据框如下所示:

student_id     address_type     Address          City      IsPrimaryAddress
 1                R              6th street      MPLS              Yes
 1                P              10th street SE  Chicago           No
 1                E              10th street SE  Chicago           No
 2                P              Washington ST   Boston            Yes
 2                E              Essex St        NYC               No
 3                E              1040 Taft Blvd  Dallas            Yes
 4                R              24th street      NYC             Yes
 4                P              8th street SE  Chicago           No
 5                T              10 Riverside Ave Boston          Yes
 6                                20th St          NYC           Yes

如何实现?我在address_type上尝试了等级和累加功能,但无法正常工作。

1 个答案:

答案 0 :(得分:1)

首先使用Categorical使address_type可以进行自定义排序

df.address_type=pd.Categorical(df.address_type,['R','P','E','T',''],ordered=True)

df=df.sort_values('address_type') # the sort the values
df['new']=(df.groupby('student_id').address_type.transform('first')==df.address_type).map({True:'Yes',False:'No'}) # since we sorted the value , so the first value of each group is the one we need to mark as Yes
df=df.sort_index() # sort the index order back to the original df


   student_id address_type  new
0           1            R  Yes
1           1            P   No
2           1            E   No
3           2            P  Yes
4           2            E   No
5           3            E  Yes
6           4            R  Yes
7           4            P   No
8           5            T  Yes
9           6               Yes