Question

我已经提出了一个正则表达式，它可以很好地用于查找电话号码。

我想更进一步，在大文本块中使用它来识别“单元格”或“移动”字样后面最多10个字符的匹配字符串。我希望它返回Cell Phone: (954) 555-4444以及Mobile 555-777-9999但不是Fax: (555) 444-6666

的数字

类似（伪代码）

regex = re.compile(r'(\+?[2-9]\d{2}\)?[ -]?\d{3}[ -]?\d{4})')
bigstring = # Some giant string added together from many globbed files
matches = regex.search(bigstring)
for match in matches:
    if match follows 'cell' or match follows 'mobile':
        print match.group(0)

Answer 1

你可以这样做：

txt='''\
Call me on my mobile anytime: 555-666-1212 
The office is best at 555-222-3333 
Dont ever call me at 555-666-2345 '''

import re

print re.findall(r'(?:(mobile|office).{0,15}(\+?[2-9]\d{2}\)?[ -]?\d{3}[ -]?\d{4}))', txt)

打印：

[('mobile', '555-666-1212'), ('office', '555-222-3333')]

Answer 2

您可以使用正则表达式执行此操作。在re文档中，您会发现模式r'(?<=abc)def'仅在'def'之前与'abc'匹配。

同样r'Hello (?=World)'匹配'Hello '，如果后跟'World'

正则表达式匹配字符串python中的子串

2 个答案: