Question

我最近有一本关于python的书，它有一章关于Regex，有一段我无法理解的代码。有人能解释一下这里发生了什么（这部分是关于Regex组的吗？）

>>> my_regex = r'(?P<zip>Zip:\s*\d\d\d\d\d)\s*(State:\s*\w\w)'
>>> addrs = "Zip: 10010 State: NY"
>>> y = re.search(my_regex, addrs)
>>> y.groupdict('zip')
{'zip': 'Zip: 10010'}
>>> y.group(2)
'State: NY'

Answer 1

正则表达式定义：

(?P<zip>...)

创建一个命名组“zip”

Zip:\s*

匹配“Zip：”和零个或多个空格字符

\d

匹配数字

\w

匹配单词字符[A-Za-z0-9 _]

y.groupdict('zip')

groupdict方法返回一个字典，其中命名组为键，其匹配为值。在这种情况下，返回“zip”组的匹配

y.group(2)

返回第二组的匹配，这是一个未命名的组“（...）”

希望有所帮助。

Answer 2

搜索方法将返回包含正则表达式模式结果的对象。

groupdict 返回一个组的字典，其中键是由（？P ...）定义的组的名称。这里 name 是该组的名称。

组会返回匹配的组列表。 “州：纽约”是你的第三组。第一个是整个字符串，第二个是“Zip：10010”。

顺便提一下，这是一个相对简单的问题。我只是在google上查找了方法文档，找到了this page。谷歌是你的朋友。

Answer 3

# my_regex = r' <= this means that the string is a raw string, normally you'd need to use double backslashes
# ( ... ) this groups something
# ? this means that the previous bit was optional, why it's just after a group bracket I know not
# * this means "as many of as you can find"
# \s is whitespace
# \d is a digit, also works with [0-9]
# \w is an alphanumeric character
my_regex = r'(?P<zip>Zip:\s*\d\d\d\d\d)\s*(State:\s*\w\w)'
addrs = "Zip: 10010 State: NY"

# Runs the grep on the string
y = re.search(my_regex, addrs)

Answer 4

(?P<identifier>match)语法是Python实现命名捕获组的方法。这样，您就可以使用名称而不仅仅是序号来访问match匹配的内容。

由于第一组括号名为zip，因此您可以使用匹配的groupdict方法访问其匹配项以获得{identifier: match}对。或者你可以使用y.group('zip')如果你只对匹配感兴趣（这通常是有意义的，因为你已经知道了标识符）。您也可以使用其序号（1）访问同一匹配。下一个匹配是未命名的，因此访问它的唯一方法是它的编号。

Answer 5

添加到之前的答案：在我看来，你最好选择一种类型的组（命名或未命名）并坚持下去。通常我使用命名组。例如：

>>> my_regex = r'(?P<zip>Zip:\s*\d\d\d\d\d)\s*(?P<state>State:\s*\w\w)'
>>> addrs = "Zip: 10010 State: NY"
>>> y = re.search(my_regex, addrs)
>>> print y.groupdict()
{'state': 'State: NY', 'zip': 'Zip: 10010'}

Answer 6

strfriend 是您的朋友：

http://strfriend.com/vis?re=(Zip%3A\s*\d\d\d\d\d)\s*(State%3A\s*\w\w)

编辑：为什么它会让整行成为实际评论中的链接，而不是预览？

Python-Regex，这里发生了什么？

6 个答案: