背景
我有以下示例df
import pandas as pd
df = pd.DataFrame({'Birthdate':['This person was born Date of Birth: 5/6/1950 and other',
'no Date of Birth: nothing here',
'One Date of Birth: 01/01/2001 last here'],
'P_ID': [1,2,3],
'N_ID' : ['A1', 'A2', 'A3']}
)
df
Birthdate N_ID P_ID
0 This person was born Date of Birth: 5/6/1950 a... A1 1
1 no Date of Birth: nothing here A2 2
2 One Date of Birth: 01/01/2001 last here A3 3
目标
用*BDAY*
代替生日的前几位,例如5/6/1950
成为*BDAY*1950
所需的输出
Birthdate N_ID P_ID
0 This person was born Date of Birth: *BDAY*1950 a... A1 1
1 no Date of Birth: nothing here A2 2
2 One last Date of Birth: *BDAY*2001 last here A3 3
尝试
在python - Replace first five characters in a column with asterisks中,我尝试了以下代码:
df.replace(r'Date of Birth: ^\d{3}-\d{2}', "*BDAY*", regex=True)
,但并不能完全满足我的期望
问题
如何实现所需的输出?
答案 0 :(得分:1)
尝试一下:
get_function_hook()
答案 1 :(得分:1)
此表达式也可能起作用:
import pandas as pd
df = pd.DataFrame({'Birthdate':['This person was born Date of Birth: 5/6/1950 and other',
'no Date of Birth: nothing here',
'One Date of Birth: 01/01/2001 last here'],
'P_ID': [1,2,3],
'N_ID' : ['A1', 'A2', 'A3']}
)
df= df.replace(r'(?i)date\s+of\s+birth:\s+\d{1,2}/\d{1,2}/', "Date of Birth: *BDAY*", regex=True)
print(df)
该表达式在regex101.com的右上角进行了解释,如果您想探索/简化/修改它,在this link中,您可以观察到它如何与某些示例输入匹配,如果你喜欢。
答案 2 :(得分:0)
您的正则表达式将查找3位数字,然后是'-',然后是2位数字。您的示例数据有2位数字,一个'/',然后是2位数字。
尝试:
df.replace(
r'(Date of Birth:\s+)\d{2}/\d{2}/',
r"\1*BDAY*",
regex=True)