Question

背景

我有以下示例df

import pandas as pd
df = pd.DataFrame({'Birthdate':['This person was born Date of Birth: 5/6/1950 and other',
                          'no Date of Birth: nothing here',
                          'One Date of Birth: 01/01/2001 last here'], 
                  'P_ID': [1,2,3],
                  'N_ID' : ['A1', 'A2', 'A3']} 

                 )

 df
                                 Birthdate                 N_ID P_ID
    0   This person was born Date of Birth: 5/6/1950 a...   A1  1
    1   no Date of Birth: nothing here                      A2  2
    2   One Date of Birth: 01/01/2001 last here             A3  3

目标

用*BDAY*代替生日的前几位，例如5/6/1950成为*BDAY*1950

所需的输出

                                 Birthdate                 N_ID P_ID
    0   This person was born Date of Birth: *BDAY*1950 a... A1  1
    1   no Date of Birth: nothing here                      A2  2
    2   One last Date of Birth: *BDAY*2001 last here        A3  3

尝试

在python - Replace first five characters in a column with asterisks中，我尝试了以下代码： df.replace(r'Date of Birth: ^\d{3}-\d{2}', "*BDAY*", regex=True)，但并不能完全满足我的期望

问题

如何实现所需的输出？

Answer 1

尝试一下：

get_function_hook()

Answer 2

此表达式也可能起作用：

import pandas as pd
df = pd.DataFrame({'Birthdate':['This person was born Date of Birth: 5/6/1950 and other',
                          'no Date of Birth: nothing here',
                          'One Date of Birth: 01/01/2001 last here'], 
                  'P_ID': [1,2,3],
                  'N_ID' : ['A1', 'A2', 'A3']} 

                 )
df= df.replace(r'(?i)date\s+of\s+birth:\s+\d{1,2}/\d{1,2}/', "Date of Birth: *BDAY*", regex=True)

print(df)

该表达式在regex101.com的右上角进行了解释，如果您想探索/简化/修改它，在this link中，您可以观察到它如何与某些示例输入匹配，如果你喜欢。

Answer 3

正则表达式错误

您的正则表达式将查找3位数字，然后是'-'，然后是2位数字。您的示例数据有2位数字，一个'/'，然后是2位数字。

尝试：

df.replace(
    r'(Date of Birth:\s+)\d{2}/\d{2}/',
    r"\1*BDAY*",
    regex=True)

替换出生日期前几位数的熊猫

3 个答案:

正则表达式错误