我有以下文字:
LAST_NAME_1, Firs_name_1 Home Phone: 333-336-6514
192 generic St.
Newton MA 02471
Status: Attender Marital: Married Adult: M/F: Env.No.:
LAST_NAME_2, Firs_name_2 Home Phone: 777-777-2205 Cell Phone: 888-888-8888
10 generic St.
Newton MA 02471
E-mail : email@gmail.com
Status: Member Marital: Married Adult: Y M/F: M Env.No.: 5
我需要在电话号码后获取文字,但他们可以拥有不同订单的家庭电话,手机,紧急电话,传真或工作电话。是否有任何正则表达式可以在最后一个电话号码后面给我文本?我的意思是在第二个文本块中获取Cell Phone: 888-888-888
之后的文本?
答案 0 :(得分:2)
In [1]: import re
In [2]: s=""" LAST_NAME_1, Firs_name_1 Home Phone: 333-336-6514
Status: Member Marital: Married Adult: Y M/F: M Env.No.: 5""" ...: 192 generic St.
...: Newton MA 02471
...: Status: Attender Marital: Married Adult: M/F: Env.No.:
...:
...:
...: LAST_NAME_2, Firs_name_2 Home Phone: 777-777-2205 Cell Phone: 888-888-8888
...: 10 generic St.
...: Newton MA 02471
...:
...: E-mail : email@gmail.com
...: Status: Member Marital: Married Adult: Y M/F: M Env.No.: 5"""
In [3]:
In [4]: re.findall('[0-9]{3}-[0-9]{3}-[0-9]{4}\n(.*)', s, re.MULTILINE)
Out[4]: ['192 generic St. ', '10 generic St. ']
NODE EXPLANATION
-----------------------------------------------------
[0-9]{3} any character of: '0' to '9' (3 times)
-----------------------------------------------------
- '-'
-----------------------------------------------------
[0-9]{3} any character of: '0' to '9' (3 times)
-----------------------------------------------------
- '-'
-----------------------------------------------------
[0-9]{4} any character of: '0' to '9' (4 times)
-----------------------------------------------------
\n '\n' (newline)
-----------------------------------------------------
( group and capture to \1:
-----------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
------------------------------------------------------
) end of \1
答案 1 :(得分:1)
这是你想要的吗?
doc = '''LAST_NAME_1, Firs_name_1 Home Phone: 333-336-6514
192 generic St.
Newton MA 02471
Status: Attender Marital: Married Adult: M/F: Env.No.:
LAST_NAME_2, Firs_name_2 Home Phone: 777-777-2205 Cell Phone: 888-888-8888
10 generic St.
Newton MA 02471
E-mail : email@gmail.com
Status: Member Marital: Married Adult: Y M/F: M Env.No.: 5'''
import re
p = re.compile(r'[0-9]{3}-[0-9]{3}-[0-9]{4}\n(.*)')
for x in p.finditer(doc):
print x.group(1)
输出
192 generic St.
10 generic St.
解释
[0-9]{3}-[0-9]{3}-[0-9]{4}\n(.*)
__________________________ <- phone number
__ <- newline
__ <- this part is group(1)