修改阻止熊猫字符串中的七个数字

时间:2019-08-25 21:37:46

标签: python regex pandas text replace

背景

我有以下df,它是对blocking seven digit numbers in string pandas的修改

import pandas as pd
df = pd.DataFrame({'Text':['This person num is (111)888-8780 and other',
                          'dont block 23 here',
                          'two numbers: 001-002-1234 and here',
                          'block this (666)6636666',
                           '1-510-999-9999 is one more'], 
                  'P_ID': [1,2,3,4,5],
                  'N_ID' : ['A1', 'A2', 'A3','A4', 'A5']}) 


N_ID    P_ID    Text
0   A1  1   This person num is (111)888-8780 and other
1   A2  2   dont block 23 here
2   A3  3   two numbers: 001-002-1234 and here
3   A4  4   block this (666)6636666
4   A5  5   1-510-999-9999 is one more

目标

1)用括号括住所有七个数字,例如(111)888-8780(666)6636666成为**Block**

2)避免阻塞非七位数的数字,例如23

3)创建新列

尝试

df['New'] = df['Text'].str.replace(r'((?:[\d]-?){7,})','**block**')

输出

    N_ID P_ID Text New
0                  This person num is (111)**block** and other
1                  dont block 23 here
2                  two numbers: **block** and here
3                  block this (666)**block**
4                   **block** is one more

但这无法完全阻止(111)888-8780(666)6636666

问题

如何调整str.replace(r'((?:[\d]-?){7,})以完全屏蔽括号中的数字,例如(111)

1 个答案:

答案 0 :(得分:1)

一种可能性是将要删除的所有字符集包含在字符类中。

df['New'] = df['Text'].str.replace(r'[()\d-]{7,}','**block**')

在此,字符集包括括号,数字和连字符。这些必须至少发生七次。这将返回

df['New']
Out[14]: 
0    This person num is **block** and other
1                        dont block 23 here
2           two numbers: **block** and here
3                      block this **block**
4                     **block** is one more
Name: New, dtype: object