Python Pandas-替换相关文本失败

时间:2019-02-19 13:05:44

标签: python-3.x pandas

我正在尝试比较两列-主列和次列。辅助列可能在所需的字符串后带有(。)或类似“(休假)的文本。

I learned that to replace ("."), it has to be passed with ("\.")

如果第二列包含“ NOTAPPLICABLEHERE”之类的特定值,那么我将结果视为True。

为此,我创建了一个名为-

的变量

Exceptions =“ NOTAPPLICABLEHERE”

下面的代码就是我写的,

temp_result_df[res_col_name]  = (temp_result_df[primaryreportreqcolname].eq(temp_result_df[RequiredSecondaryReport_Col_Name].str.replace \
                                ('\.'|' (On Leave)', '', regex = True)) | (temp_result_df[RequiredSecondaryReport_Col_Name]== Exceptions))

它失败,并显示一条错误消息-不支持|:'str'和'str'的操作数类型。

PrimaryColumn   SecondaryColumn    ExpectedOutput
Mr               Mr.                  True
Jr               Jr                   True
Mrs              Mrs                  True
Mr               Mrs                  False
Mr               Mr (On Leave)        True
Mr               NOTAPPLICABLEHERE    True

请帮助我。

1 个答案:

答案 0 :(得分:1)

我认为存在正则表达式问题-需要转义()并在''周围删除|

p = 'PrimaryColumn'
s = 'SecondaryColumn'

Exceptions = "NOTAPPLICABLEHERE"

df['new']  = df[p].eq(df[s].str.replace(r'\.| \(On Leave\)', '')) | (df[s] == Exceptions)

或者:

df['new'] = df[p].eq(df[s].replace(r'\.| \(On Leave\)', '', regex = True)) | 
            (df[s ]== Exceptions)

print (df)
  PrimaryColumn    SecondaryColumn  ExpectedOutput    new
0            Mr                Mr.            True   True
1            Jr                 Jr            True   True
2           Mrs                Mrs            True   True
3            Mr                Mrs           False  False
4            Mr      Mr (On Leave)            True   True
5            Mr  NOTAPPLICABLEHERE            True   True