我正在尝试比较两列-主列和次列。辅助列可能在所需的字符串后带有(。)或类似“(休假)的文本。
I learned that to replace ("."), it has to be passed with ("\.")
如果第二列包含“ NOTAPPLICABLEHERE”之类的特定值,那么我将结果视为True。
为此,我创建了一个名为-
的变量Exceptions =“ NOTAPPLICABLEHERE”
下面的代码就是我写的,
temp_result_df[res_col_name] = (temp_result_df[primaryreportreqcolname].eq(temp_result_df[RequiredSecondaryReport_Col_Name].str.replace \
('\.'|' (On Leave)', '', regex = True)) | (temp_result_df[RequiredSecondaryReport_Col_Name]== Exceptions))
它失败,并显示一条错误消息-不支持|:'str'和'str'的操作数类型。
PrimaryColumn SecondaryColumn ExpectedOutput
Mr Mr. True
Jr Jr True
Mrs Mrs True
Mr Mrs False
Mr Mr (On Leave) True
Mr NOTAPPLICABLEHERE True
请帮助我。
答案 0 :(得分:1)
我认为存在正则表达式问题-需要转义()
并在''
周围删除|
:
p = 'PrimaryColumn'
s = 'SecondaryColumn'
Exceptions = "NOTAPPLICABLEHERE"
df['new'] = df[p].eq(df[s].str.replace(r'\.| \(On Leave\)', '')) | (df[s] == Exceptions)
或者:
df['new'] = df[p].eq(df[s].replace(r'\.| \(On Leave\)', '', regex = True)) |
(df[s ]== Exceptions)
print (df)
PrimaryColumn SecondaryColumn ExpectedOutput new
0 Mr Mr. True True
1 Jr Jr True True
2 Mrs Mrs True True
3 Mr Mrs False False
4 Mr Mr (On Leave) True True
5 Mr NOTAPPLICABLEHERE True True