正则表达式用于一组重复数字

时间:2014-05-01 13:47:56

标签: python regex text

我有以下文字:

LAST_NAME_1, Firs_name_1    Home Phone: 333-336-6514
192 generic St. 
Newton MA 02471
Status: Attender    Marital:    Married Adult:  M/F:    Env.No.:


LAST_NAME_2, Firs_name_2    Home Phone: 777-777-2205    Cell Phone: 888-888-8888
10 generic St. 
Newton MA 02471

    E-mail :    email@gmail.com
Status: Member  Marital:    Married Adult:  Y   M/F:    M   Env.No.:    5

我需要在电话号码后获取文字,但他们可以拥有不同订单的家庭电话,手机,紧急电话,传真或工作电话。是否有任何正则表达式可以在最后一个电话号码后面给我文本?我的意思是在第二个文本块中获取Cell Phone: 888-888-888之后的文本?

2 个答案:

答案 0 :(得分:2)

In [1]: import re

In [2]: s=""" LAST_NAME_1, Firs_name_1    Home Phone: 333-336-6514
Status: Member  Marital:    Married Adult:  Y   M/F:    M   Env.No.:    5"""   ...: 192 generic St.
   ...: Newton MA 02471
   ...: Status: Attender    Marital:    Married Adult:  M/F:    Env.No.:
   ...:
   ...:
   ...: LAST_NAME_2, Firs_name_2    Home Phone: 777-777-2205    Cell Phone: 888-888-8888
   ...: 10 generic St.
   ...: Newton MA 02471
   ...:
   ...:     E-mail :    email@gmail.com
   ...: Status: Member  Marital:    Married Adult:  Y   M/F:    M   Env.No.:    5"""

In [3]:

In [4]: re.findall('[0-9]{3}-[0-9]{3}-[0-9]{4}\n(.*)', s, re.MULTILINE)
Out[4]: ['192 generic St. ', '10 generic St. ']

NODE         EXPLANATION
-----------------------------------------------------
  [0-9]{3}     any character of: '0' to '9' (3 times)
-----------------------------------------------------
  -            '-'
-----------------------------------------------------
  [0-9]{3}     any character of: '0' to '9' (3 times)
-----------------------------------------------------
  -            '-'
-----------------------------------------------------
  [0-9]{4}     any character of: '0' to '9' (4 times)
-----------------------------------------------------
  \n           '\n' (newline)
-----------------------------------------------------
  (            group and capture to \1:
-----------------------------------------------------
    .*           any character except \n (0 or more times
                 (matching the most amount possible))
------------------------------------------------------
  )            end of \1

答案 1 :(得分:1)

这是你想要的吗?

doc = '''LAST_NAME_1, Firs_name_1    Home Phone: 333-336-6514
192 generic St. 
Newton MA 02471
Status: Attender    Marital:    Married Adult:  M/F:    Env.No.:


LAST_NAME_2, Firs_name_2    Home Phone: 777-777-2205    Cell Phone: 888-888-8888
10 generic St. 
Newton MA 02471

    E-mail :    email@gmail.com
Status: Member  Marital:    Married Adult:  Y   M/F:    M   Env.No.:    5'''

import re

p = re.compile(r'[0-9]{3}-[0-9]{3}-[0-9]{4}\n(.*)')

for x in p.finditer(doc):
    print x.group(1)

输出

192 generic St. 
10 generic St. 

解释

[0-9]{3}-[0-9]{3}-[0-9]{4}\n(.*)
__________________________          <- phone number
                          __        <- newline
                             __     <- this part is group(1)