Question

背景

我有以下示例df，它是Alter number string in pandas column的替代

import pandas as pd
df = pd.DataFrame({'Text' : ['Jon J Smith  Record #:  0000004 is this ', 
                                   'Record #:  0000003 Mary Lisa Hider found here', 
                                   'Jane A Doe is also here Record #:  0000002',
                                'Record #:  0000001'], 

                      'P_ID': [1,2,3,4],
                      'N_ID' : ['A1', 'A2', 'A3', 'A4']

                     })

#rearrange columns
df = df[['Text','N_ID', 'P_ID']]
df

                                    Text             N_ID   P_ID
0   Jon J Smith Record #: 0000004 is this       A1  1
1   Record #: 0000003 Mary Lisa Hider fou...    A2  2
2   Jane A Doe is also here Record #: 000...    A3  3
3   Record #: 0000001                           A4  4

目标

1）将Record #:之后的数字替换为**BLOCK**

Jon J Smith Record #: 0000004 is this
Jon J Smith Record #: **BLOCK** is this

2）创建新列

所需的输出

    Text    N_ID    P_ID    New_Text              
0                          Jon J Smith Record #: **BLOCK** is this      
1                          Record #: **BLOCK**  Mary Lisa Hider fou...  
2                          Jane A Doe is also here Record #: **BLOCK**  
3                          Record #: **BLOCK**

尝试

我尝试了以下方法，但这不太正确

df['New_Text']= df['Text'].replace(r'(?i)record\s+#: \d+', r"Date of Birth: **BLOCK**", regex=True)

问题

如何更改代码以获得所需的输出？

Answer 1

您要匹配:之后的单个空格，您可以将其变成\s+（如果只能是空格，则可以重复空格 +），并使用捕获组第一部分。

(?i)(medical\s+record\s+#:\s+)\d+

Regex demo

在替换使用中

\1**BLOCK**

最后的代码如下：

df['New_Text']= df['Text'].replace(r'(?i)(medical\s+record\s+#:\s+)\d+', r"\1**BLOCK**", regex=True)

修改alter number字符串熊猫

1 个答案: