Question

Python re module中的search()和match()函数之间有什么区别？

我已经阅读了documentation（current documentation），但我似乎永远不会记住它。我不得不查阅并重新学习它。我希望有人会用例子清楚地回答它，以便（也许）它会坚持到底。或者至少我会有一个更好的地方回答我的问题，重新学习它将花费更少的时间。

Answer 1

re.match锚定在字符串的开头。这与换行符无关，因此与在模式中使用^不同。

正如re.match documentation所说：

如果是零个或多个字符   字符串的开头匹配正则表达式模式，返回一个   相应的MatchObject实例。   如果字符串没有，则返回None   匹配模式;请注意这是   不同于零长度匹配。

注意：如果要查找匹配项   在字符串中的任何位置，使用search()   代替。

re.search搜索整个字符串，如the documentation says：

扫描字符串寻找a 正则表达式所在的位置 pattern产生匹配，并返回一个相应的MatchObject实例。如果没有位置，则返回None 字符串匹配模式;注意这与找到一个不同零长度匹配在某个点上字符串。

因此，如果您需要在字符串的开头匹配，或者匹配整个字符串，请使用match。它更快。否则使用search。

该文档的specific section for match vs. search也涵盖多行字符串：

Python提供了两种不同的原语   基于常规的操作   表达式：match检查匹配项   仅在字符串的开头，   而search检查匹配   字符串中的任何地方（这是什么   默认情况下Perl会这样做。

请注意，match可能与search不同   即使使用正则表达式   以'^'开头：'^'仅匹配   在字符串的开头，或在   MULTILINE模式也立即生效   按照换行符。 “match”   仅当模式时，操作才会成功   匹配字符串的开始   无论模式，还是在开始时   由可选pos给出的位置   争论无论是否一个   新行在它之前。

现在，谈话充足。是时候看一些示例代码了：

# example code:
string_with_newlines = """something
someotherthing"""

import re

print re.match('some', string_with_newlines) # matches
print re.match('someother', 
               string_with_newlines) # won't match
print re.match('^someother', string_with_newlines, 
               re.MULTILINE) # also won't match
print re.search('someother', 
                string_with_newlines) # finds something
print re.search('^someother', string_with_newlines, 
                re.MULTILINE) # also finds something

m = re.compile('thing$', re.MULTILINE)

print m.match(string_with_newlines) # no match
print m.match(string_with_newlines, pos=4) # matches
print m.search(string_with_newlines, 
               re.MULTILINE) # also matches

Answer 2

search⇒在字符串中找到任何地方并返回一个匹配对象。

match⇒在字符串的开头找到一些东西并返回一个匹配对象。

Answer 3

re.search 搜索 es表示模式，而re.match 不搜索模式;如果没有，除了在字符串的开头匹配之外别无选择。

Answer 4

区别在于， re.match()误导任何习惯于 Perl ， grep 或 sed 正则表达式匹配的人，并且re.search()没有。： - ）

更清醒的是，As John D. Cook remarks，re.match()“表现得好像每个模式都有^前置。”换句话说，re.match('pattern')等于re.search('^pattern')。所以它锚定了一个模式的左侧。但它也没有锚定模式的右侧：仍需要终止$。

坦率地说，鉴于上述情况，我认为应该弃用re.match()。我很想知道它应该保留的原因。

Answer 5

您可以参考以下示例来了解re.match和re.search

的工作原理

a = "123abc"
t = re.match("[a-z]+",a)
t = re.search("[a-z]+",a)

re.match将不返回任何内容，但re.search将返回abc。

Answer 6

匹配比搜索快得多，所以你可以做regex.match（（。*？）word（。*？））而不是做regex.search（＆＃34; word＆＃34;）而获得吨数如果你正在处理数百万个样本，那就是性能。

This comment from @ivan_bilan under the accepted answer above让我想到如果这样的 hack 实际上是在加快速度，那么让我们找出你真正获得多少吨的表现。

我准备了以下测试套件：

<div id="blink">
  <div id="extraBlink">
    <div id="border">
    	<footer class="promotion"> Half Price Today! </footer>
    </div>
  </div>
</div>

我进行了10次测量（1M，2M，......，10M字），这给了我以下情节：

由此产生的线条令人惊讶地（实际上并不令人惊讶）直线。鉴于此特定模式组合， import random import re import string import time LENGTH = 10 LIST_SIZE = 1000000 def generate_word(): word = [random.choice(string.ascii_lowercase) for _ in range(LENGTH)] word = ''.join(word) return word wordlist = [generate_word() for _ in range(LIST_SIZE)] start = time.time() [re.search('python', word) for word in wordlist] print('search:', time.time() - start) start = time.time() [re.match('(.*?)python(.*?)', word) for word in wordlist] print('match:', time.time() - start)函数（略微）更快。这个测试的道德：避免过度优化你的代码。

Answer 7

re.match尝试匹配字符串开头的模式。 re.search尝试匹配整个字符串中的模式，直到找到匹配为止。

Answer 8

更短：

search扫描整个字符串。
match仅执行字符串的开头。

以下Ex说：

>>> a = "123abc"
>>> re.match("[a-z]+",a)
None
>>> re.search("[a-z]+",a)
abc

re.search和re.match有什么区别？

8 个答案: