阻止字符串熊猫中的七个数字

时间:2019-08-07 01:12:08

标签: regex python-3.x pandas text replace

背景

我有以下示例df

import pandas as pd
df = pd.DataFrame({'Text':['This person num is 111-888-8888 and other',
                          'dont block 23 here',
                          'two numbers: 001-002-1234 and some other 123-123-1234 here',
                          'block this 666-666-6666',
                           '1-510-999-9999 is one more'], 
                  'P_ID': [1,2,3,4,5],
                  'N_ID' : ['A1', 'A2', 'A3','A4', 'A5']}) 

    N_ID    P_ID    Text
0   A1  1   This person num is 111-888-8888 and other
1   A2  2   dont block 23 here
2   A3  3   two numbers: 001-002-1234 and some other 123-1...
3   A4  4   block this 666-666-6666
4   A5  5   1-510-999-9999 is one more

目标

1)屏蔽所有七个数字,例如111-888-8888成为**Block**

2)避免阻塞非七位数的数字,例如23

3)创建新列

尝试

我尝试了以下

df['New_Text'] = df['Text'].str.replace(r'\d+','**Block**')

但是它会阻止所有数字

也尝试过

我还尝试过将\d+更改为许多其他版本,例如/^\d{7}$/取自Regexp exactly seven digits,例如^[0-9]{7}取自 Regex to match "<seven digits> - <filename>" with only one set of seven digits和例如\b[0-9]{7}(?![0-9])取自 REGEX To get seven numbers in a row?,但它们都不起作用。

所需的输出

    N_ID P_ID Text  New_Text
0                   This person num is **Block** and other
1                   dont block 23 here
2                   two numbers: **Block**  and some other **Block** 
3                   block this **Block** 
4                   1-**Block**  is one more

问题

如何调整代码以实现所需的输出?

1 个答案:

答案 0 :(得分:2)

您可以尝试此正则表达式。 ((?:[\d]-?){7,})

Regex Demo

最后的代码块是这个

df['New_Text'] = df['Text'].str.replace(r'((?:[\d]-?){7,})','**Block**')