背景
我有以下示例df
import pandas as pd
df = pd.DataFrame({'Text':['This person num is 111-888-8888 and other',
'dont block 23 here',
'two numbers: 001-002-1234 and some other 123-123-1234 here',
'block this 666-666-6666',
'1-510-999-9999 is one more'],
'P_ID': [1,2,3,4,5],
'N_ID' : ['A1', 'A2', 'A3','A4', 'A5']})
N_ID P_ID Text
0 A1 1 This person num is 111-888-8888 and other
1 A2 2 dont block 23 here
2 A3 3 two numbers: 001-002-1234 and some other 123-1...
3 A4 4 block this 666-666-6666
4 A5 5 1-510-999-9999 is one more
目标
1)屏蔽所有七个数字,例如111-888-8888
成为**Block**
2)避免阻塞非七位数的数字,例如23
3)创建新列
尝试
我尝试了以下
df['New_Text'] = df['Text'].str.replace(r'\d+','**Block**')
但是它会阻止所有数字
也尝试过
我还尝试过将\d+
更改为许多其他版本,例如/^\d{7}$/
取自Regexp exactly seven digits,例如^[0-9]{7}
取自
Regex to match "<seven digits> - <filename>" with only one set of seven digits和例如\b[0-9]{7}(?![0-9])
取自
REGEX To get seven numbers in a row?,但它们都不起作用。
所需的输出
N_ID P_ID Text New_Text
0 This person num is **Block** and other
1 dont block 23 here
2 two numbers: **Block** and some other **Block**
3 block this **Block**
4 1-**Block** is one more
问题
如何调整代码以实现所需的输出?
答案 0 :(得分:2)
您可以尝试此正则表达式。 ((?:[\d]-?){7,})
最后的代码块是这个
df['New_Text'] = df['Text'].str.replace(r'((?:[\d]-?){7,})','**Block**')