我一直被困在熊猫问题上,我似乎无法弄明白。 我有这样的数据框:
ref, value, rule, result, new_column
a100, 25, high, fail, nan
a100, 25, high, pass, nan
a100, 25, medium, fail, nan
a100, 25, medium, pass, nan
a101, 15, high, fail, nan
a101, 15, high, pass, nan
a102, 20, high, pass, nan
我想使用以下伪代码
为此数据框添加新列对于ref中的每个唯一值,如果result = fail
,则new_column = no
用于相同" ref"的所有后续行。值。
这就是新数据框的样子。
ref, value, rule, result, new_column
a100, 25, high, fail, no
a100, 25, high, pass, no
a100, 25, medium, fail, no
a100, 25, medium, pass, no
a101, 15, high, fail, no
a101, 15, high, pass, no
a102, 20, high, pass, yes
我设法做的是以下内容:
ref, value, rule, result, new_column
a100, 25, high, fail, no
a100, 25, high, pass, yes
这是通过df.loc
功能实现的。
但我需要将函数应用于唯一值,而不是每行。
答案 0 :(得分:3)
我认为您可以使用transform
:
print (df)
ref value rule result new_column
0 a100 25 high pass NaN
1 a100 25 high fail NaN
2 a100 25 medium fail NaN
3 a100 25 medium pass NaN
4 a101 15 high fail NaN
5 a101 15 high pass NaN
6 a102 20 high pass NaN
df['new_column']=df.groupby('ref')['result']
.transform(lambda x: 'no' if ((x=='fail').any()) else 'yes')
print (df)
ref value rule result new_column
0 a100 25 high pass no
1 a100 25 high fail no
2 a100 25 medium fail no
3 a100 25 medium pass no
4 a101 15 high fail no
5 a101 15 high pass no
6 a102 20 high pass yes
感谢Jon Clements
了解replace
的其他解决方案:
df['new_column'] = df.groupby('ref')['result']
.transform(lambda L: (L == 'fail').any())
.replace({True: 'no', False: 'yes'})
print (df)
ref value rule result new_column
0 a100 25 high pass no
1 a100 25 high fail no
2 a100 25 medium fail no
3 a100 25 medium pass no
4 a101 15 high fail no
5 a101 15 high pass no
6 a102 20 high pass yes