Question

我从一本书中得到了混淆的正则表达组：“用Python自动化无聊的东西：初学者的实用编程”。正则表达式如下：

#! python3
# phoneAndEmail.py - Finds phone numbers and email addresses on the clipboard
# The data of paste from: https://www.nostarch.com/contactus.html
import pyperclip, re

phoneRegex = re.compile(r'''(
     (\d{3}|\(\d{3}\))?              # area code
     (\s|-|\.)?                      # separator
     (\d{3})                         # first 3 digits
     (\s|-|\.)                       # separator  
     (\d{4})                         # last 4 digits
     (\s*(ext|x|ext.)\s*(\d{2,,5}))? # extension
     )''', re.VERBOSE )

# TODO: Create email regex.

emailRegex = re.compile(r'''(
     [a-zA-Z0-9._%+-]+               # username
      @                              # @ symbol
     [a-zA-Z0-9.-]+                  # domian name
     (\.[a-zA-Z]{2,4})               # dot-something
     )''', re.VERBOSE)
# TODO: Find matches in clipboard text.

text = str(pyperclip.paste())
matches = []
for groups in phoneRegex.findall(text):
    **phoneNum = '-'.join ([groups[1], groups[3], groups[5]])
    if groups[8]!= '':
      phoneNum += ' x' + groups[8]**
    matches.append(phoneNum)
print(groups[0])
for groups in emailRegex.findall(text):
    matches.append(groups[0])

# TODO: Copy results to the clipboard.

if len(matches) > 0:
    pyperclip.copy('\n'.join(matches))
    print('Copied to clipboard:')
    print('\n'.join(matches))
else:
    print('No phone number or email addresses found.')

我对群组1 /群组[2] ...... /群组[8] 感到困惑。和phoneRegex中有多少组。 groups（）和 groups [] 之间有什么区别？

粘贴数据来自：[https://www.nostarch.com/contactus.html]

Answer 1

在正则表达式中，括号()创建所谓的捕获组。每个组都分配一个数字，从1开始。

例如：

In [1]: import re

In [2]: m = re.match('([0-9]+)([a-z]+)', '123xyz')

In [3]: m.group(1)
Out[3]: '123'

In [4]: m.group(2)
Out[4]: 'xyz'

此处，([0-9]+)是第一个捕获组，([a-z]+)是第二个捕获组。当您应用正则表达式时，第一个捕获组结束＆＃34;捕获＆＃34;字符串123（因为它是匹配的部分），第二部分捕获xyz。

使用findall，它会在字符串中搜索正则表达式匹配的所有位置，并且对于每个匹配，它会将捕获的组列表作为元组返回。我鼓励你在ipython中稍微玩一下，以了解它的工作原理。另请查看文档：{{3}}

Answer 2

正则表达式可以包含组。它们由()表示。可以使用组来提取可能有用的匹配部分。

例如，在电话号码正则表达式中，有9个组：

Group  Subpattern
1      ((\d{3}|\(\d{3}\))?(\s|-|\.) (\d{3}) (\s|-|\.)(\d{4})(\s*(ext|x|ext.)\s*(\d{2,,5}))?)
2      (\d{3}|\(\d{3}\))?
3      (\s|-|\.)
4      (\d{3})
5      (\s|-|\.)
6      (\d{4})
7      (\s*(ext|x|ext.)\s*(\d{2,,5}))?
8      (ext|x|ext.)
9      (\d{2,,5})

请注意每个组如何包含在() s。

中

groups[x]只是指特定组匹配的字符串。 groups[0]表示由组1匹配的字符串，groups[1]表示由组2匹配的字符串，等等。

有些混淆了对python正则表达式中组的理解

2 个答案: