背景
我有以下df,它是对blocking seven digit numbers in string pandas的修改
import pandas as pd
df = pd.DataFrame({'Text':['This person num is (111)888-8780 and other',
'dont block 23 here',
'two numbers: 001-002-1234 and here',
'block this (666)6636666',
'1-510-999-9999 is one more'],
'P_ID': [1,2,3,4,5],
'N_ID' : ['A1', 'A2', 'A3','A4', 'A5']})
N_ID P_ID Text
0 A1 1 This person num is (111)888-8780 and other
1 A2 2 dont block 23 here
2 A3 3 two numbers: 001-002-1234 and here
3 A4 4 block this (666)6636666
4 A5 5 1-510-999-9999 is one more
目标
1)用括号括住所有七个数字,例如(111)888-8780
和(666)6636666
成为**Block**
2)避免阻塞非七位数的数字,例如23
3)创建新列
尝试
df['New'] = df['Text'].str.replace(r'((?:[\d]-?){7,})','**block**')
输出
N_ID P_ID Text New
0 This person num is (111)**block** and other
1 dont block 23 here
2 two numbers: **block** and here
3 block this (666)**block**
4 **block** is one more
但这无法完全阻止(111)888-8780
和(666)6636666
问题
如何调整str.replace(r'((?:[\d]-?){7,})
以完全屏蔽括号中的数字,例如(111)
?
答案 0 :(得分:1)
一种可能性是将要删除的所有字符集包含在字符类中。
df['New'] = df['Text'].str.replace(r'[()\d-]{7,}','**block**')
在此,字符集包括括号,数字和连字符。这些必须至少发生七次。这将返回
df['New']
Out[14]:
0 This person num is **block** and other
1 dont block 23 here
2 two numbers: **block** and here
3 block this **block**
4 **block** is one more
Name: New, dtype: object