根据条件创建具有上一列汇总的列

时间:2019-05-09 05:34:07

标签: python pandas

具有3列的数据框:

FLAG CLASS   STUDENT
yes 'Sci'   'Francy'
no  'Sci'   'Alex'
yes 'math'  'Arthur'
yes 'math'   NaN
yes 'eng'   'Jack'
yes 'math'  'Paul'
yes 'eng'   'Zach'

我想为每个班级的所有学生添加新列ALL_STUD。但是,仅对带有FLAG = yes的行执行此操作。结果如下:

FLAG CLASS   STUDENT   ALL_STUD
yes 'Sci'   'Francy'  'Francy, Alex'
no  'Sci'   'Alex'     NaN
yes 'math'  'Arthur'  'Arthur, Paul'
yes 'math'   NaN      'Arthur, Paul'
yes 'eng'   'Jack'    'Jack, Zach'
yes 'math'  'Paul'    'Arthur, Paul'
yes 'eng'   'Zach'    'Jack, Zach'

我一直在尝试这样的事情:

df.loc[df['FLAG']=='yes', 'ALL_STU'] = df.groupby('CLASS').STUDENT.transform(','.join)

但是'math'班的学生无法用(','.join)转换成'Arthur, Paul',因为数学班上有一个空名NaN。有任何解决方案或其他方式可以做到这一点吗?

从此question开始。

1 个答案:

答案 0 :(得分:3)

使用Series.dropna

f = lambda x: ','.join(x.dropna())
#alternative 
#f = lambda x: ','.join(y for y in x if y == y)
df.loc[df['FLAG']=='yes', 'ALL_STU'] = df.groupby('CLASS').STUDENT.transform(f)
print (df)
  FLAG   CLASS   STUDENT          ALL_STU
0  yes   'Sci'  'Francy'  'Francy','Alex'
1   no   'Sci'    'Alex'              NaN
2  yes  'math'  'Arthur'  'Arthur','Paul'
3  yes  'math'       NaN  'Arthur','Paul'
4  yes   'eng'    'Jack'    'Jack','Zach'
5  yes  'math'    'Paul'  'Arthur','Paul'
6  yes   'eng'    'Zach'    'Jack','Zach'

您还可以在两侧进行过滤,以避免附加不匹配的值:

mask = df['FLAG']=='yes'
f = lambda x: ','.join(x.dropna())
df.loc[mask, 'ALL_STU'] = df.loc[mask, 'STUDENT'].groupby(df['CLASS']).transform(f)
print (df)
  FLAG   CLASS   STUDENT          ALL_STU
0  yes   'Sci'  'Francy'         'Francy'
1   no   'Sci'    'Alex'              NaN
2  yes  'math'  'Arthur'  'Arthur','Paul'
3  yes  'math'       NaN  'Arthur','Paul'
4  yes   'eng'    'Jack'    'Jack','Zach'
5  yes  'math'    'Paul'  'Arthur','Paul'
6  yes   'eng'    'Zach'    'Jack','Zach'