具有3列的数据框:
FLAG CLASS STUDENT
yes 'Sci' 'Francy'
no 'Sci' 'Alex'
yes 'math' 'Arthur'
yes 'math' NaN
yes 'eng' 'Jack'
yes 'math' 'Paul'
yes 'eng' 'Zach'
我想为每个班级的所有学生添加新列ALL_STUD。但是,仅对带有FLAG = yes的行执行此操作。结果如下:
FLAG CLASS STUDENT ALL_STUD
yes 'Sci' 'Francy' 'Francy, Alex'
no 'Sci' 'Alex' NaN
yes 'math' 'Arthur' 'Arthur, Paul'
yes 'math' NaN 'Arthur, Paul'
yes 'eng' 'Jack' 'Jack, Zach'
yes 'math' 'Paul' 'Arthur, Paul'
yes 'eng' 'Zach' 'Jack, Zach'
我一直在尝试这样的事情:
df.loc[df['FLAG']=='yes', 'ALL_STU'] = df.groupby('CLASS').STUDENT.transform(','.join)
但是'math'班的学生无法用(','.join)
转换成'Arthur, Paul'
,因为数学班上有一个空名NaN
。有任何解决方案或其他方式可以做到这一点吗?
从此question开始。
答案 0 :(得分:3)
f = lambda x: ','.join(x.dropna())
#alternative
#f = lambda x: ','.join(y for y in x if y == y)
df.loc[df['FLAG']=='yes', 'ALL_STU'] = df.groupby('CLASS').STUDENT.transform(f)
print (df)
FLAG CLASS STUDENT ALL_STU
0 yes 'Sci' 'Francy' 'Francy','Alex'
1 no 'Sci' 'Alex' NaN
2 yes 'math' 'Arthur' 'Arthur','Paul'
3 yes 'math' NaN 'Arthur','Paul'
4 yes 'eng' 'Jack' 'Jack','Zach'
5 yes 'math' 'Paul' 'Arthur','Paul'
6 yes 'eng' 'Zach' 'Jack','Zach'
您还可以在两侧进行过滤,以避免附加不匹配的值:
mask = df['FLAG']=='yes'
f = lambda x: ','.join(x.dropna())
df.loc[mask, 'ALL_STU'] = df.loc[mask, 'STUDENT'].groupby(df['CLASS']).transform(f)
print (df)
FLAG CLASS STUDENT ALL_STU
0 yes 'Sci' 'Francy' 'Francy'
1 no 'Sci' 'Alex' NaN
2 yes 'math' 'Arthur' 'Arthur','Paul'
3 yes 'math' NaN 'Arthur','Paul'
4 yes 'eng' 'Jack' 'Jack','Zach'
5 yes 'math' 'Paul' 'Arthur','Paul'
6 yes 'eng' 'Zach' 'Jack','Zach'