背景
我有以下示例df
,它是Alter number string in pandas column的替代
import pandas as pd
df = pd.DataFrame({'Text' : ['Jon J Smith Record #: 0000004 is this ',
'Record #: 0000003 Mary Lisa Hider found here',
'Jane A Doe is also here Record #: 0000002',
'Record #: 0000001'],
'P_ID': [1,2,3,4],
'N_ID' : ['A1', 'A2', 'A3', 'A4']
})
#rearrange columns
df = df[['Text','N_ID', 'P_ID']]
df
Text N_ID P_ID
0 Jon J Smith Record #: 0000004 is this A1 1
1 Record #: 0000003 Mary Lisa Hider fou... A2 2
2 Jane A Doe is also here Record #: 000... A3 3
3 Record #: 0000001 A4 4
目标
1)将Record #:
之后的数字替换为**BLOCK**
Jon J Smith Record #: 0000004 is this
Jon J Smith Record #: **BLOCK** is this
2)创建新列
所需的输出
Text N_ID P_ID New_Text
0 Jon J Smith Record #: **BLOCK** is this
1 Record #: **BLOCK** Mary Lisa Hider fou...
2 Jane A Doe is also here Record #: **BLOCK**
3 Record #: **BLOCK**
尝试
我尝试了以下方法,但这不太正确
df['New_Text']= df['Text'].replace(r'(?i)record\s+#: \d+', r"Date of Birth: **BLOCK**", regex=True)
问题
如何更改代码以获得所需的输出?
答案 0 :(得分:1)
您要匹配:
之后的单个空格,您可以将其变成\s+
(如果只能是空格,则可以重复空格 +
),并使用捕获组第一部分。
(?i)(medical\s+record\s+#:\s+)\d+
在替换使用中
\1**BLOCK**
最后的代码如下:
df['New_Text']= df['Text'].replace(r'(?i)(medical\s+record\s+#:\s+)\d+', r"\1**BLOCK**", regex=True)