Question

源字符串是：

# Python 3.4.3
s = r'abc123d, hello 3.1415926, this is my book'

这是我的模式：

pattern = r'-?[0-9]+(\\.[0-9]*)?|-?\\.[0-9]+'

然而，re.search可以给我正确的结果：

m = re.search(pattern, s)
print(m)  # output: <_sre.SRE_Match object; span=(3, 6), match='123'>

re.findall只是转出一个空列表：

L = re.findall(pattern, s)
print(L)  # output: ['', '', '']

为什么re.findall能给我预期的清单：

['123', '3.1415926']

Answer 1

这里有两点需要注意：

re.findall会返回捕获的文本
模式中的r'\\.'部分匹配两个连续的字符\和除换行符之外的任何字符。

请参阅findall reference：

如果模式中存在一个或多个组，则返回组列表;如果模式有多个组，这将是一个元组列表。结果中包含空匹配，除非它们触及另一场比赛的开头。

请注意使re.findall只返回匹配值，您通常可以

删除多余的捕获组（例如(a(b)c) - ＆gt; abc）
将所有捕获组转换为non-capturing（即将(替换为(?:），除非存在引用模式中的组值的反向引用（见下文）
使用re.finditer代替（[x.group() for x in re.finditer(pattern, s)]）

在您的情况下，findall返回了所有已删除的文本，因为您在\\字符串文字中r''试图匹配文字\。

要匹配数字，您需要使用

-?\d*\.?\d+

正则表达式匹配：

-? - 可选减号
\d* - 可选数字
\.? - 可选的小数点分隔符
\d+ - 一位或多位数。

请参阅demo

以下是IDEONE demo：

import re
s = r'abc123d, hello 3.1415926, this is my book'
pattern = r'-?\d*\.?\d+'
L = re.findall(pattern, s)
print(L)

Answer 2

s = r'abc123d, hello 3.1415926, this is my book'
print re.findall(r'-?[0-9]+(?:\.[0-9]*)?|-?\.[0-9]+',s)

使用escape时，您不需要raw mode两次。

输出：['123', '3.1415926']

此外，返回类型将是strings的列表。如果您希望返回类型为integers而floats使用map

import re,ast
s = r'abc123d, hello 3.1415926, this is my book'
print map(ast.literal_eval,re.findall(r'-?[0-9]+(?:\.[0-9]*)?|-?\.[0-9]+',s))

输出：[123, 3.1415926]

Answer 3

仅说明您为什么认为search返回了您想要的，而findall没有返回？

搜索返回一个SRE_Match对象，该对象包含一些信息，例如：

string：属性包含传递给搜索功能的字符串。
re：搜索功能中使用的REGEX对象。
groups()：REGEX内部的捕获组捕获的字符串列表。
group(index)：使用index > 0按组检索捕获的字符串。
group(0)：返回与REGEX匹配的字符串。

search在找到第一个构建SRE_Match对象并返回它的对象时停止，请检查以下代码：

import re

s = r'abc123d'
pattern = r'-?[0-9]+(\.[0-9]*)?|-?\.[0-9]+'
m = re.search(pattern, s)
print(m.string)  # 'abc123d'
print(m.group(0))  # REGEX matched 123
print(m.groups())  # there is only one group in REGEX (\.[0-9]*) will  empy string tgis why it return (None,) 

s = ', hello 3.1415926, this is my book'
m2 = re.search(pattern, s)  # ', hello 3.1415926, this is my book'
print(m2.string)    # abc123d
print(m2.group(0))  # REGEX matched 3.1415926
print(m2.groups())  # the captured group has captured this part '.1415926'

findall的行为有所不同，因为它不会停止，直到找到它一直提取的第一个马赫直到文本结尾，但是如果REGEX包含至少一个捕获组，则{{ 1}}不返回匹配的字符串，而是返回捕获组捕获的字符串：

findall

当发现第一个马赫是import re s = r'abc123d , hello 3.1415926, this is my book' pattern = r'-?[0-9]+(\.[0-9]*)?|-?\.[0-9]+' m = re.findall(pattern, s) print(m) # ['', '.1415926']时，第一个element返回，捕获组仅捕获了'123'，而第二个''在第二场比赛中被捕获element捕获组与此部分'3.1415926'相匹配。

如果要使'.1415926'返回匹配的字符串，则应将findall中的所有捕获组()设置为非捕获组REGEX：

(?:)

re.findall表现得很奇怪

3 个答案: